FAQ
Hey all

Why does the FCFS scheduler only let a node chooses one task at a time in
one job? In order to increase the data locality,
it is reasonable to let a node to choose all its local tasks (if it can)
from a job at a time.

Any reply will be appreciated.

Thanks

Chen

Search Discussions

  • Nan Zhu at Jan 17, 2011 at 2:28 pm
    Hi, Chen

    How is it going recently?

    Actually I think you misundertand the code in assignTasks() in
    JobQueueTaskScheduler.java, see the following structure of the interesting
    codes:

    //I'm sorry, I hacked the code so much, the name of the variables may be
    different from the original version

    for (i = 0; i < MapperCapacity; ++i){
    ...
    for (JobInProgress job:jobQueue){
    //try to shedule a node-local or rack-local map tasks
    //here is the interesting place
    t = job.obtainNewLocalMapTask(...);
    if (t != null){
    ...
    break;//the break statement here will make the control flow back
    to "for (job:jobQueue)" which means that it will restart map tasks selection
    procedure from the first job, so , it is actually schedule all of the first
    job's local mappers first until the map slots are full
    }
    }
    }

    BTW, we can only schedule a reduce task in a single heartbeat



    Best,
    Nan
    On Sat, Jan 15, 2011 at 1:45 PM, He Chen wrote:

    Hey all

    Why does the FCFS scheduler only let a node chooses one task at a time in
    one job? In order to increase the data locality,
    it is reasonable to let a node to choose all its local tasks (if it can)
    from a job at a time.

    Any reply will be appreciated.

    Thanks

    Chen
  • He Chen at Jan 17, 2011 at 4:25 pm
    Hi Nan,

    Thank you for the reply. I understand what you mean. What I concern is
    inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
    time.

    Now I understand why it only assigns one task at a time. It is because the
    outside loop:

    for (i = 0; i < MapperCapacity; ++i){

    (......)

    }

    I mean why this loop exists here. Why does the scheduler use this type of
    loop. It imposes overhead to the task assigning process if only assign one
    task at a time. It is obviously that a node can be assigned all available
    local tasks it can in one "afford obtainNewLocalMapTask(......)" method
    call.

    Bests

    Chen
    On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu wrote:

    Hi, Chen

    How is it going recently?

    Actually I think you misundertand the code in assignTasks() in
    JobQueueTaskScheduler.java, see the following structure of the interesting
    codes:

    //I'm sorry, I hacked the code so much, the name of the variables may be
    different from the original version

    for (i = 0; i < MapperCapacity; ++i){
    ...
    for (JobInProgress job:jobQueue){
    //try to shedule a node-local or rack-local map tasks
    //here is the interesting place
    t = job.obtainNewLocalMapTask(...);
    if (t != null){
    ...
    break;//the break statement here will make the control flow back
    to "for (job:jobQueue)" which means that it will restart map tasks
    selection
    procedure from the first job, so , it is actually schedule all of the first
    job's local mappers first until the map slots are full
    }
    }
    }

    BTW, we can only schedule a reduce task in a single heartbeat



    Best,
    Nan
    On Sat, Jan 15, 2011 at 1:45 PM, He Chen wrote:

    Hey all

    Why does the FCFS scheduler only let a node chooses one task at a time in
    one job? In order to increase the data locality,
    it is reasonable to let a node to choose all its local tasks (if it can)
    from a job at a time.

    Any reply will be appreciated.

    Thanks

    Chen
  • Nan Zhu at Jan 17, 2011 at 4:37 pm
    Hi, Chen

    Actually not one task each time,

    see this statement:

    assignedTasks.add(t);

    assignedTasks is the return value of this method, and it's a collection of
    selected tasks, it will contain multiple tasks if the candidates are there..

    Best,

    Nan
    On Tue, Jan 18, 2011 at 12:24 AM, He Chen wrote:

    Hi Nan,

    Thank you for the reply. I understand what you mean. What I concern is
    inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
    time.

    Now I understand why it only assigns one task at a time. It is because the
    outside loop:

    for (i = 0; i < MapperCapacity; ++i){

    (......)

    }

    I mean why this loop exists here. Why does the scheduler use this type of
    loop. It imposes overhead to the task assigning process if only assign one
    task at a time. It is obviously that a node can be assigned all available
    local tasks it can in one "afford obtainNewLocalMapTask(......)" method
    call.

    Bests

    Chen
    On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu wrote:

    Hi, Chen

    How is it going recently?

    Actually I think you misundertand the code in assignTasks() in
    JobQueueTaskScheduler.java, see the following structure of the
    interesting
    codes:

    //I'm sorry, I hacked the code so much, the name of the variables may be
    different from the original version

    for (i = 0; i < MapperCapacity; ++i){
    ...
    for (JobInProgress job:jobQueue){
    //try to shedule a node-local or rack-local map tasks
    //here is the interesting place
    t = job.obtainNewLocalMapTask(...);
    if (t != null){
    ...
    break;//the break statement here will make the control flow back
    to "for (job:jobQueue)" which means that it will restart map tasks
    selection
    procedure from the first job, so , it is actually schedule all of the first
    job's local mappers first until the map slots are full
    }
    }
    }

    BTW, we can only schedule a reduce task in a single heartbeat



    Best,
    Nan
    On Sat, Jan 15, 2011 at 1:45 PM, He Chen wrote:

    Hey all

    Why does the FCFS scheduler only let a node chooses one task at a time
    in
    one job? In order to increase the data locality,
    it is reasonable to let a node to choose all its local tasks (if it
    can)
    from a job at a time.

    Any reply will be appreciated.

    Thanks

    Chen
  • Nan Zhu at Jan 17, 2011 at 4:47 pm
    OK, I got your point,

    you mean why don't we put the for loop into obtainNewLocalMapTask(),

    yes, I think we can do that, but the result is the same with current codes,
    and I don't think it will lead too many benefits on performance, and
    personally, I like the current style, :-)

    Best,

    Nan
    On Tue, Jan 18, 2011 at 12:24 AM, He Chen wrote:

    Hi Nan,

    Thank you for the reply. I understand what you mean. What I concern is
    inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
    time.

    Now I understand why it only assigns one task at a time. It is because the
    outside loop:

    for (i = 0; i < MapperCapacity; ++i){

    (......)

    }

    I mean why this loop exists here. Why does the scheduler use this type of
    loop. It imposes overhead to the task assigning process if only assign one
    task at a time. It is obviously that a node can be assigned all available
    local tasks it can in one "afford obtainNewLocalMapTask(......)" method
    call.

    Bests

    Chen
    On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu wrote:

    Hi, Chen

    How is it going recently?

    Actually I think you misundertand the code in assignTasks() in
    JobQueueTaskScheduler.java, see the following structure of the
    interesting
    codes:

    //I'm sorry, I hacked the code so much, the name of the variables may be
    different from the original version

    for (i = 0; i < MapperCapacity; ++i){
    ...
    for (JobInProgress job:jobQueue){
    //try to shedule a node-local or rack-local map tasks
    //here is the interesting place
    t = job.obtainNewLocalMapTask(...);
    if (t != null){
    ...
    break;//the break statement here will make the control flow back
    to "for (job:jobQueue)" which means that it will restart map tasks
    selection
    procedure from the first job, so , it is actually schedule all of the first
    job's local mappers first until the map slots are full
    }
    }
    }

    BTW, we can only schedule a reduce task in a single heartbeat



    Best,
    Nan
    On Sat, Jan 15, 2011 at 1:45 PM, He Chen wrote:

    Hey all

    Why does the FCFS scheduler only let a node chooses one task at a time
    in
    one job? In order to increase the data locality,
    it is reasonable to let a node to choose all its local tasks (if it
    can)
    from a job at a time.

    Any reply will be appreciated.

    Thanks

    Chen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 15, '11 at 5:46a
activeJan 17, '11 at 4:47p
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Nan Zhu: 3 posts He Chen: 2 posts

People

Translate

site design / logo © 2022 Grokbase