FAQ
Hi,



We recently did some experiment on mapreduce job scheduling and found that
sometimes there were 2 jobs running on the same machine and each of them ran
very slowly. We used to think that 2nd job will wait for the 1st freeing the
slave machine occupied and then began to run and seems that this is wrong.



Our questions are:

(1) How does this scenario happen? Is it because that there's a threshold
about on workload and if a slave machine doesn't reach the threshold, then
it will carry new task ignoring that there's other task running on it
already?

(2) If (1) is true, how can we avoid it? If (1) is not true, then what's
the reason of this scenario and how to avoid it?



Thanks very much in advance. J





Best regards,

Wisteria.Lavender

One is never too old to learn. ^^

Search Discussions

  • Harsh J at Mar 29, 2011 at 2:50 pm
    Hello,

    Hadoop poses no restrictions on the number of concurrent jobs. Perhaps
    you meant tasks.

    If on a TaskTracker, the maximum limit of tasks is set to N, N
    parallel tasks may be run on it. The N is set to two by default (since
    most machines today are 2+ core'd). You can tweak this parameter to
    reflect one, and then you'll see only a maximum of one Task running on
    the TaskTracker at a given time.

    On Tue, Mar 29, 2011 at 7:51 PM,
    wrote:
    Hi,



    We recently did some experiment on mapreduce job scheduling and found that
    sometimes there were 2 jobs running on the same machine and each of them ran
    very slowly. We used to think that 2nd job will wait for the 1st freeing the
    slave machine occupied and then began to run and seems that this is wrong.



    Our questions are:

    (1)   How does this scenario happen? Is it because that there’s a threshold
    about on workload and if a slave machine doesn’t reach the threshold, then
    it will carry new task ignoring that there’s other task running on it
    already?

    (2)   If (1) is true, how can we avoid it? If (1) is not true, then what’s
    the reason of this scenario and how to avoid it?



    Thanks very much in advance. J





    Best regards,

    Wisteria.Lavender

    One is never too old to learn. ^^


    --
    Harsh J
    http://harshj.com
  • Greg Roelofs at Mar 29, 2011 at 10:00 pm

    If on a TaskTracker, the maximum limit of tasks is set to N, N
    parallel tasks may be run on it. The N is set to two by default (since
    most machines today are 2+ core'd). You can tweak this parameter to
    reflect one, and then you'll see only a maximum of one Task running on
    the TaskTracker at a given time.
    Depending on data locality, however, that could actually make things run
    more slowly. (In principle, anyway; I don't have even anecdotal evidence
    to back that up. :-) I'm sure one could construct a particular set of
    hardware with a particular data layout for which it would be true, though.)

    Greg

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedMar 29, '11 at 2:21p
activeMar 29, '11 at 10:00p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase