FAQ
Hi experts,

Have a question about the race between fair scheduler and speculative
execution on CDH3 (any minor version) or older hadoop versions (e.g.,
0.18.3).
Let's say we have turned on both fair scheduler and speculative execution,
and there are enough jobs (e.g., JOB_X, JOB_Y .) waiting in a queue. And,
currently running JOB_A has some straggler tasks.

What happens if a task tracker finishes a task of JOB_A and reports to a job
tracker that it has finished a task? Is the task slot allocated to other
jobs (JOB_X or JOB_Y.) to satisfy more fairness, or to a task from a running
job (JOB_A) for speculative execution?


Thanks,
Manhee

--

Search Discussions

  • Manhee Jo at Jan 31, 2013 at 5:02 am
    Can anybody help me on this please?



    From: Manhee Jo
    Sent: Wednesday, January 30, 2013 2:50 PM
    To: cdh-user@cloudera.org
    Subject: the race between fair scheduler and speculative execution



    Hi experts,

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3).

    Let's say we have turned on both fair scheduler and speculative execution,
    and there are enough jobs (e.g., JOB_X, JOB_Y .) waiting in a queue. And,
    currently running JOB_A has some straggler tasks.

    What happens if a task tracker finishes a task of JOB_A and reports to a job
    tracker that it has finished a task? Is the task slot allocated to other
    jobs (JOB_X or JOB_Y.) to satisfy more fairness, or to a task from a running
    job (JOB_A) for speculative execution?



    Thanks,

    Manhee

    --




    --
  • Harold at Jan 31, 2013 at 6:51 am
    Hi Manhee,

    Fair scheduler will divide up available task slots to JOB_A, JOB_X, JOB_Y
    depending on what you configure the minimum maps, minimum reduces, and
    weights to.

    Speculative execution is a different subject and not related to fair
    scheduler. Speculative execution will create duplicate tasks (like two of
    the exact same task) and when the fastest one completes, the duplicate task
    is killed off. Speculative execution is turned on mainly to speed up jobs
    but the trade off is wasted resources with the duplicate killed tasks.

    I think your question may be about pre-emption. When pre-emption is turned
    on in the Fair scheduler, it ensures that if JOB_A is running and using all
    of the cluster's task slots, and then JOB_X comes along and is not getting
    it's fair share (minimum maps or reduce task slots), then JOB_X 'pre-empts'
    JOB_A by killing off some of JOB_A's tasks to make sure JOB_X is getting
    it's fair share of resources.

    Hope that helps...
    Harold
    On Wed, Jan 30, 2013 at 9:01 PM, Manhee Jo wrote:

    Can anybody help me on this please?****

    ** **

    *From:* Manhee Jo
    *Sent:* Wednesday, January 30, 2013 2:50 PM
    *To:* cdh-user@cloudera.org
    *Subject:* the race between fair scheduler and speculative execution****

    ** **

    Hi experts,****

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3). ****

    Let's say we have turned on both fair scheduler and speculative execution, and
    there are enough jobs (e.g., JOB_X, JOB_Y …) waiting in a queue. And, currently
    running JOB_A has some straggler tasks. ****

    What happens if a task tracker finishes a task of JOB_A and reports to a
    job tracker that it has finished a task? Is the task slot allocated to other
    jobs (JOB_X or JOB_Y…) to satisfy more fairness, or to a task from a
    running job (JOB_A) for speculative execution?****

    ** **

    Thanks,****

    Manhee****

    --


    ****

    --


    --
  • Manhee Jo at Jan 31, 2013 at 8:18 am
    Hi Harold,



    Thank you for your reply.

    What if pre-emption is off and a task tracker finishes a task from JOB_A,
    while JOB_A is not finished

    due to some straggler tasks?

    Would the task tracker process a task from JOB_X/JOB_Y or straggler tasks
    from JOB_A?





    Thanks,

    Manhee



    From: harold
    Sent: Thursday, January 31, 2013 3:52 PM
    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    Hi Manhee,



    Fair scheduler will divide up available task slots to JOB_A, JOB_X, JOB_Y
    depending on what you configure the minimum maps, minimum reduces, and
    weights to.



    Speculative execution is a different subject and not related to fair
    scheduler. Speculative execution will create duplicate tasks (like two of
    the exact same task) and when the fastest one completes, the duplicate task
    is killed off. Speculative execution is turned on mainly to speed up jobs
    but the trade off is wasted resources with the duplicate killed tasks.



    I think your question may be about pre-emption. When pre-emption is turned
    on in the Fair scheduler, it ensures that if JOB_A is running and using all
    of the cluster's task slots, and then JOB_X comes along and is not getting
    it's fair share (minimum maps or reduce task slots), then JOB_X 'pre-empts'
    JOB_A by killing off some of JOB_A's tasks to make sure JOB_X is getting
    it's fair share of resources.



    Hope that helps...

    Harold



    On Wed, Jan 30, 2013 at 9:01 PM, Manhee Jo wrote:

    Can anybody help me on this please?



    From: Manhee Jo
    Sent: Wednesday, January 30, 2013 2:50 PM
    To: cdh-user@cloudera.org
    Subject: the race between fair scheduler and speculative execution



    Hi experts,

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3).

    Let's say we have turned on both fair scheduler and speculative execution,
    and there are enough jobs (e.g., JOB_X, JOB_Y .) waiting in a queue. And,
    currently running JOB_A has some straggler tasks.

    What happens if a task tracker finishes a task of JOB_A and reports to a job
    tracker that it has finished a task? Is the task slot allocated to other
    jobs (JOB_X or JOB_Y.) to satisfy more fairness, or to a task from a running
    job (JOB_A) for speculative execution?



    Thanks,

    Manhee

    --




    --






    --




    --
  • Harold at Jan 31, 2013 at 8:21 am
    The unused tasks slots will be divided up between jobs according to the min
    map/reduce shares you set. So if a task slot becomes available, the
    jobtracker will look to see which job is getting the least amount of it's
    fair share and give the task slot to that job.


    On Thu, Jan 31, 2013 at 12:18 AM, Manhee Jo wrote:

    Hi Harold,****

    ** **

    Thank you for your reply.****

    What if pre-emption is off and a task tracker finishes a task from JOB_A,
    while JOB_A is not finished ****

    due to some straggler tasks? ****

    Would the task tracker process a task from JOB_X/JOB_Y or straggler tasks
    from JOB_A?****

    ** **

    ** **

    Thanks, ****

    Manhee****

    ** **

    *From:* harold
    *Sent:* Thursday, January 31, 2013 3:52 PM
    *To:* cdh-user@cloudera.org
    *Subject:* Re: the race between fair scheduler and speculative execution**
    **

    ** **

    Hi Manhee,****

    ** **

    Fair scheduler will divide up available task slots to JOB_A, JOB_X, JOB_Y
    depending on what you configure the minimum maps, minimum reduces, and
    weights to.****

    ** **

    Speculative execution is a different subject and not related to fair
    scheduler. Speculative execution will create duplicate tasks (like two of
    the exact same task) and when the fastest one completes, the duplicate task
    is killed off. Speculative execution is turned on mainly to speed up jobs
    but the trade off is wasted resources with the duplicate killed tasks.****

    ** **

    I think your question may be about pre-emption. When pre-emption is
    turned on in the Fair scheduler, it ensures that if JOB_A is running and
    using all of the cluster's task slots, and then JOB_X comes along and is
    not getting it's fair share (minimum maps or reduce task slots), then JOB_X
    'pre-empts' JOB_A by killing off some of JOB_A's tasks to make sure JOB_X
    is getting it's fair share of resources.****

    ** **

    Hope that helps...****

    Harold****

    ** **

    On Wed, Jan 30, 2013 at 9:01 PM, Manhee Jo wrote:****

    Can anybody help me on this please?****

    ****

    *From:* Manhee Jo
    *Sent:* Wednesday, January 30, 2013 2:50 PM
    *To:* cdh-user@cloudera.org
    *Subject:* the race between fair scheduler and speculative execution****

    ****

    Hi experts,****

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3). ****

    Let's say we have turned on both fair scheduler and speculative execution, and
    there are enough jobs (e.g., JOB_X, JOB_Y …) waiting in a queue. And, currently
    running JOB_A has some straggler tasks. ****

    What happens if a task tracker finishes a task of JOB_A and reports to a
    job tracker that it has finished a task? Is the task slot allocated to other
    jobs (JOB_X or JOB_Y…) to satisfy more fairness, or to a task from a
    running job (JOB_A) for speculative execution?****

    ****

    Thanks,****

    Manhee****

    --


    ****

    --


    ****

    ** **

    --


    ****

    --


    --
  • Manhee Jo at Jan 31, 2013 at 9:48 am
    Thank you Harold, again.

    Now I understand. So even if there are straggler tasks,

    fair scheduler results in higher priority than speculative execution.



    Thank you.





    Regards,

    Manhee





    From: harold
    Sent: Thursday, January 31, 2013 5:21 PM
    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    The unused tasks slots will be divided up between jobs according to the min
    map/reduce shares you set. So if a task slot becomes available, the
    jobtracker will look to see which job is getting the least amount of it's
    fair share and give the task slot to that job.





    On Thu, Jan 31, 2013 at 12:18 AM, Manhee Jo wrote:

    Hi Harold,



    Thank you for your reply.

    What if pre-emption is off and a task tracker finishes a task from JOB_A,
    while JOB_A is not finished

    due to some straggler tasks?

    Would the task tracker process a task from JOB_X/JOB_Y or straggler tasks
    from JOB_A?





    Thanks,

    Manhee



    From: harold
    Sent: Thursday, January 31, 2013 3:52 PM
    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    Hi Manhee,



    Fair scheduler will divide up available task slots to JOB_A, JOB_X, JOB_Y
    depending on what you configure the minimum maps, minimum reduces, and
    weights to.



    Speculative execution is a different subject and not related to fair
    scheduler. Speculative execution will create duplicate tasks (like two of
    the exact same task) and when the fastest one completes, the duplicate task
    is killed off. Speculative execution is turned on mainly to speed up jobs
    but the trade off is wasted resources with the duplicate killed tasks.



    I think your question may be about pre-emption. When pre-emption is turned
    on in the Fair scheduler, it ensures that if JOB_A is running and using all
    of the cluster's task slots, and then JOB_X comes along and is not getting
    it's fair share (minimum maps or reduce task slots), then JOB_X 'pre-empts'
    JOB_A by killing off some of JOB_A's tasks to make sure JOB_X is getting
    it's fair share of resources.



    Hope that helps...

    Harold



    On Wed, Jan 30, 2013 at 9:01 PM, Manhee Jo wrote:

    Can anybody help me on this please?



    From: Manhee Jo
    Sent: Wednesday, January 30, 2013 2:50 PM
    To: cdh-user@cloudera.org
    Subject: the race between fair scheduler and speculative execution



    Hi experts,

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3).

    Let's say we have turned on both fair scheduler and speculative execution,
    and there are enough jobs (e.g., JOB_X, JOB_Y .) waiting in a queue. And,
    currently running JOB_A has some straggler tasks.

    What happens if a task tracker finishes a task of JOB_A and reports to a job
    tracker that it has finished a task? Is the task slot allocated to other
    jobs (JOB_X or JOB_Y.) to satisfy more fairness, or to a task from a running
    job (JOB_A) for speculative execution?



    Thanks,

    Manhee

    --




    --






    --




    --






    --




    --
  • Harold at Jan 31, 2013 at 7:09 pm
    Ah I understand your question now. I'm not 100% sure on this but I still
    believe that the fair scheduler would take precedence over speculative
    execution (meaning that extra duplicate tasks used for speculative
    execution would count towards the total tasks being used by a job) and the
    Fair scheduler would recognize that when choosing which job to divvy up
    tasks to.

    On Thu, Jan 31, 2013 at 1:48 AM, Manhee Jo wrote:

    Thank you Harold, again. ****

    Now I understand. So even if there are straggler tasks, ****

    fair scheduler results in higher priority than speculative execution. ****

    ** **

    Thank you.****

    ** **

    ** **

    Regards,****

    Manhee****

    ** **

    ** **

    *From:* harold
    *Sent:* Thursday, January 31, 2013 5:21 PM

    *To:* cdh-user@cloudera.org
    *Subject:* Re: the race between fair scheduler and speculative execution**
    **

    ** **

    The unused tasks slots will be divided up between jobs according to the
    min map/reduce shares you set. So if a task slot becomes available, the
    jobtracker will look to see which job is getting the least amount of it's
    fair share and give the task slot to that job.****

    ** **

    ** **

    On Thu, Jan 31, 2013 at 12:18 AM, Manhee Jo wrote:****

    Hi Harold,****

    ****

    Thank you for your reply.****

    What if pre-emption is off and a task tracker finishes a task from JOB_A,
    while JOB_A is not finished ****

    due to some straggler tasks? ****

    Would the task tracker process a task from JOB_X/JOB_Y or straggler tasks
    from JOB_A?****

    ****

    ****

    Thanks, ****

    Manhee****

    ****

    *From:* harold
    *Sent:* Thursday, January 31, 2013 3:52 PM
    *To:* cdh-user@cloudera.org
    *Subject:* Re: the race between fair scheduler and speculative execution**
    **

    ****

    Hi Manhee,****

    ****

    Fair scheduler will divide up available task slots to JOB_A, JOB_X, JOB_Y
    depending on what you configure the minimum maps, minimum reduces, and
    weights to.****

    ****

    Speculative execution is a different subject and not related to fair
    scheduler. Speculative execution will create duplicate tasks (like two of
    the exact same task) and when the fastest one completes, the duplicate task
    is killed off. Speculative execution is turned on mainly to speed up jobs
    but the trade off is wasted resources with the duplicate killed tasks.****

    ****

    I think your question may be about pre-emption. When pre-emption is
    turned on in the Fair scheduler, it ensures that if JOB_A is running and
    using all of the cluster's task slots, and then JOB_X comes along and is
    not getting it's fair share (minimum maps or reduce task slots), then JOB_X
    'pre-empts' JOB_A by killing off some of JOB_A's tasks to make sure JOB_X
    is getting it's fair share of resources.****

    ****

    Hope that helps...****

    Harold****

    ****

    On Wed, Jan 30, 2013 at 9:01 PM, Manhee Jo wrote:****

    Can anybody help me on this please?****

    ****

    *From:* Manhee Jo
    *Sent:* Wednesday, January 30, 2013 2:50 PM
    *To:* cdh-user@cloudera.org
    *Subject:* the race between fair scheduler and speculative execution****

    ****

    Hi experts,****

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3). ****

    Let's say we have turned on both fair scheduler and speculative execution, and
    there are enough jobs (e.g., JOB_X, JOB_Y …) waiting in a queue. And, currently
    running JOB_A has some straggler tasks. ****

    What happens if a task tracker finishes a task of JOB_A and reports to a
    job tracker that it has finished a task? Is the task slot allocated to other
    jobs (JOB_X or JOB_Y…) to satisfy more fairness, or to a task from a
    running job (JOB_A) for speculative execution?****

    ****

    Thanks,****

    Manhee****

    --


    ****

    --


    ****

    ****

    --


    ****

    --


    ****

    ** **

    --


    ****

    --


    --
  • Manhee Jo at Feb 1, 2013 at 12:04 am

    . (meaning that extra duplicate tasks used for speculative execution would
    count towards the total tasks being used by a job)


    Understandable. Thank you.

    Let me ask you one more question. From



    http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html



    it says about setting the number of reducers in "How Many Reduces?"
    paragraph that



    "

    The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of
    nodes> *mapred.tasktracker.reduce.tasks.maximum).

    .

    The scaling factors above are slightly less than whole numbers to reserve a
    few reduce slots in the framework for speculative-tasks and failed tasks."



    What does the "reserve a few reduce slots for speculative-tasks" mean do you
    think?

    Are those 5% of the task slots not used by fair scheduling? Are they
    reserved ONLY for speculative execution?



    Thanks,

    Manhee





    From: harold
    Sent: Friday, February 01, 2013 4:09 AM
    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    Ah I understand your question now. I'm not 100% sure on this but I still
    believe that the fair scheduler would take precedence over speculative
    execution (meaning that extra duplicate tasks used for speculative execution
    would count towards the total tasks being used by a job) and the Fair
    scheduler would recognize that when choosing which job to divvy up tasks to.





    On Thu, Jan 31, 2013 at 1:48 AM, Manhee Jo wrote:

    Thank you Harold, again.

    Now I understand. So even if there are straggler tasks,

    fair scheduler results in higher priority than speculative execution.



    Thank you.





    Regards,

    Manhee





    From: harold
    Sent: Thursday, January 31, 2013 5:21 PM


    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    The unused tasks slots will be divided up between jobs according to the min
    map/reduce shares you set. So if a task slot becomes available, the
    jobtracker will look to see which job is getting the least amount of it's
    fair share and give the task slot to that job.





    On Thu, Jan 31, 2013 at 12:18 AM, Manhee Jo wrote:

    Hi Harold,



    Thank you for your reply.

    What if pre-emption is off and a task tracker finishes a task from JOB_A,
    while JOB_A is not finished

    due to some straggler tasks?

    Would the task tracker process a task from JOB_X/JOB_Y or straggler tasks
    from JOB_A?





    Thanks,

    Manhee



    From: harold
    Sent: Thursday, January 31, 2013 3:52 PM
    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    Hi Manhee,



    Fair scheduler will divide up available task slots to JOB_A, JOB_X, JOB_Y
    depending on what you configure the minimum maps, minimum reduces, and
    weights to.



    Speculative execution is a different subject and not related to fair
    scheduler. Speculative execution will create duplicate tasks (like two of
    the exact same task) and when the fastest one completes, the duplicate task
    is killed off. Speculative execution is turned on mainly to speed up jobs
    but the trade off is wasted resources with the duplicate killed tasks.



    I think your question may be about pre-emption. When pre-emption is turned
    on in the Fair scheduler, it ensures that if JOB_A is running and using all
    of the cluster's task slots, and then JOB_X comes along and is not getting
    it's fair share (minimum maps or reduce task slots), then JOB_X 'pre-empts'
    JOB_A by killing off some of JOB_A's tasks to make sure JOB_X is getting
    it's fair share of resources.



    Hope that helps...

    Harold



    On Wed, Jan 30, 2013 at 9:01 PM, Manhee Jo wrote:

    Can anybody help me on this please?



    From: Manhee Jo
    Sent: Wednesday, January 30, 2013 2:50 PM
    To: cdh-user@cloudera.org
    Subject: the race between fair scheduler and speculative execution



    Hi experts,

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3).

    Let's say we have turned on both fair scheduler and speculative execution,
    and there are enough jobs (e.g., JOB_X, JOB_Y .) waiting in a queue. And,
    currently running JOB_A has some straggler tasks.

    What happens if a task tracker finishes a task of JOB_A and reports to a job
    tracker that it has finished a task? Is the task slot allocated to other
    jobs (JOB_X or JOB_Y.) to satisfy more fairness, or to a task from a running
    job (JOB_A) for speculative execution?



    Thanks,

    Manhee

    --




    --






    --




    --






    --




    --






    --




    --
  • Harsh J at Feb 1, 2013 at 12:12 am
    Speculative assignments are made based on availability of slots; they
    are not "reserved" strictly in any way.

    The word reserved in that statement is made in logical sense, assuming
    just one job would be running on the cluster and you'd still like some
    speculation to happen, it recommends you to leave some slot counts
    free for it to automatically happen.
    On Fri, Feb 1, 2013 at 5:34 AM, Manhee Jo wrote:
    … (meaning that extra duplicate tasks used for speculative execution would
    count towards the total tasks being used by a job)


    Understandable. Thank you.

    Let me ask you one more question. From



    http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html



    it says about setting the number of reducers in “How Many Reduces?”
    paragraph that





    The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of
    nodes> *mapred.tasktracker.reduce.tasks.maximum).



    The scaling factors above are slightly less than whole numbers to reserve a
    few reduce slots in the framework for speculative-tasks and failed tasks.”



    What does the “reserve a few reduce slots for speculative-tasks” mean do you
    think?

    Are those 5% of the task slots not used by fair scheduling? Are they
    reserved ONLY for speculative execution?



    Thanks,

    Manhee





    From: harold
    Sent: Friday, February 01, 2013 4:09 AM


    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    Ah I understand your question now. I'm not 100% sure on this but I still
    believe that the fair scheduler would take precedence over speculative
    execution (meaning that extra duplicate tasks used for speculative execution
    would count towards the total tasks being used by a job) and the Fair
    scheduler would recognize that when choosing which job to divvy up tasks to.





    On Thu, Jan 31, 2013 at 1:48 AM, Manhee Jo wrote:

    Thank you Harold, again.

    Now I understand. So even if there are straggler tasks,

    fair scheduler results in higher priority than speculative execution.



    Thank you.





    Regards,

    Manhee





    From: harold
    Sent: Thursday, January 31, 2013 5:21 PM


    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    The unused tasks slots will be divided up between jobs according to the min
    map/reduce shares you set. So if a task slot becomes available, the
    jobtracker will look to see which job is getting the least amount of it's
    fair share and give the task slot to that job.





    On Thu, Jan 31, 2013 at 12:18 AM, Manhee Jo wrote:

    Hi Harold,



    Thank you for your reply.

    What if pre-emption is off and a task tracker finishes a task from JOB_A,
    while JOB_A is not finished

    due to some straggler tasks?

    Would the task tracker process a task from JOB_X/JOB_Y or straggler tasks
    from JOB_A?





    Thanks,

    Manhee



    From: harold
    Sent: Thursday, January 31, 2013 3:52 PM
    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    Hi Manhee,



    Fair scheduler will divide up available task slots to JOB_A, JOB_X, JOB_Y
    depending on what you configure the minimum maps, minimum reduces, and
    weights to.



    Speculative execution is a different subject and not related to fair
    scheduler. Speculative execution will create duplicate tasks (like two of
    the exact same task) and when the fastest one completes, the duplicate task
    is killed off. Speculative execution is turned on mainly to speed up jobs
    but the trade off is wasted resources with the duplicate killed tasks.



    I think your question may be about pre-emption. When pre-emption is turned
    on in the Fair scheduler, it ensures that if JOB_A is running and using all
    of the cluster's task slots, and then JOB_X comes along and is not getting
    it's fair share (minimum maps or reduce task slots), then JOB_X 'pre-empts'
    JOB_A by killing off some of JOB_A's tasks to make sure JOB_X is getting
    it's fair share of resources.



    Hope that helps...

    Harold



    On Wed, Jan 30, 2013 at 9:01 PM, Manhee Jo wrote:

    Can anybody help me on this please?



    From: Manhee Jo
    Sent: Wednesday, January 30, 2013 2:50 PM
    To: cdh-user@cloudera.org
    Subject: the race between fair scheduler and speculative execution



    Hi experts,

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3).

    Let's say we have turned on both fair scheduler and speculative execution,
    and there are enough jobs (e.g., JOB_X, JOB_Y …) waiting in a queue. And,
    currently running JOB_A has some straggler tasks.

    What happens if a task tracker finishes a task of JOB_A and reports to a job
    tracker that it has finished a task? Is the task slot allocated to other
    jobs (JOB_X or JOB_Y…) to satisfy more fairness, or to a task from a running
    job (JOB_A) for speculative execution?



    Thanks,

    Manhee

    --




    --






    --




    --






    --




    --






    --




    --



    --
    Harsh J

    --
  • Manhee Jo at Feb 1, 2013 at 12:44 am
    Thank you, Harsh.
    I see. So when the fair scheduler is on without pre-emption, and there are
    enough jobs waiting in queue, the "logically reserved" slots expected to be
    used for speculative execution would probably be taken by fair scheduler,
    with high probability.


    Thanks,
    Manhee
    -----Original Message-----
    From: Harsh J
    Sent: Friday, February 01, 2013 9:12 AM
    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution

    Speculative assignments are made based on availability of slots; they are
    not "reserved" strictly in any way.

    The word reserved in that statement is made in logical sense, assuming just
    one job would be running on the cluster and you'd still like some
    speculation
    to happen, it recommends you to leave some slot counts free for it to
    automatically happen.
    On Fri, Feb 1, 2013 at 5:34 AM, Manhee Jo wrote:
    . (meaning that extra duplicate tasks used for speculative execution
    would
    count towards the total tasks being used by a job)


    Understandable. Thank you.

    Let me ask you one more question. From



    http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html



    it says about setting the number of reducers in "How Many Reduces?"
    paragraph that



    "

    The right number of reduces seems to be 0.95 or 1.75 multiplied by
    (<no. of
    nodes> *mapred.tasktracker.reduce.tasks.maximum).

    .

    The scaling factors above are slightly less than whole numbers to
    reserve a few reduce slots in the framework for speculative-tasks and
    failed tasks."


    What does the "reserve a few reduce slots for speculative-tasks" mean
    do you think?

    Are those 5% of the task slots not used by fair scheduling? Are they
    reserved ONLY for speculative execution?



    Thanks,

    Manhee





    From: harold
    Sent: Friday, February 01, 2013 4:09 AM


    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    Ah I understand your question now. I'm not 100% sure on this but I
    still believe that the fair scheduler would take precedence over
    speculative execution (meaning that extra duplicate tasks used for
    speculative execution would count towards the total tasks being used
    by a job) and the Fair scheduler would recognize that when choosing
    which
    job to divvy up tasks to.




    On Thu, Jan 31, 2013 at 1:48 AM, Manhee Jo wrote:

    Thank you Harold, again.

    Now I understand. So even if there are straggler tasks,

    fair scheduler results in higher priority than speculative execution.



    Thank you.





    Regards,

    Manhee





    From: harold
    Sent: Thursday, January 31, 2013 5:21 PM


    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    The unused tasks slots will be divided up between jobs according to
    the min map/reduce shares you set. So if a task slot becomes
    available, the jobtracker will look to see which job is getting the
    least amount of it's fair share and give the task slot to that job.





    On Thu, Jan 31, 2013 at 12:18 AM, Manhee Jo wrote:

    Hi Harold,



    Thank you for your reply.

    What if pre-emption is off and a task tracker finishes a task from
    JOB_A, while JOB_A is not finished

    due to some straggler tasks?

    Would the task tracker process a task from JOB_X/JOB_Y or straggler
    tasks from JOB_A?





    Thanks,

    Manhee



    From: harold
    Sent: Thursday, January 31, 2013 3:52 PM
    To: cdh-user@cloudera.org
    Subject: Re: the race between fair scheduler and speculative execution



    Hi Manhee,



    Fair scheduler will divide up available task slots to JOB_A, JOB_X,
    JOB_Y depending on what you configure the minimum maps, minimum
    reduces, and weights to.



    Speculative execution is a different subject and not related to fair
    scheduler. Speculative execution will create duplicate tasks (like
    two of the exact same task) and when the fastest one completes, the
    duplicate task is killed off. Speculative execution is turned on
    mainly to speed up jobs but the trade off is wasted resources with the
    duplicate killed tasks.


    I think your question may be about pre-emption. When pre-emption is
    turned on in the Fair scheduler, it ensures that if JOB_A is running
    and using all of the cluster's task slots, and then JOB_X comes along
    and is not getting it's fair share (minimum maps or reduce task slots),
    then JOB_X 'pre-empts'
    JOB_A by killing off some of JOB_A's tasks to make sure JOB_X is
    getting it's fair share of resources.



    Hope that helps...

    Harold



    On Wed, Jan 30, 2013 at 9:01 PM, Manhee Jo wrote:

    Can anybody help me on this please?



    From: Manhee Jo
    Sent: Wednesday, January 30, 2013 2:50 PM
    To: cdh-user@cloudera.org
    Subject: the race between fair scheduler and speculative execution



    Hi experts,

    Have a question about the race between fair scheduler and speculative
    execution on CDH3 (any minor version) or older hadoop versions (e.g.,
    0.18.3).

    Let's say we have turned on both fair scheduler and speculative
    execution, and there are enough jobs (e.g., JOB_X, JOB_Y .) waiting in
    a queue. And, currently running JOB_A has some straggler tasks.

    What happens if a task tracker finishes a task of JOB_A and reports to
    a job tracker that it has finished a task? Is the task slot allocated
    to other jobs (JOB_X or JOB_Y.) to satisfy more fairness, or to a task
    from a running job (JOB_A) for speculative execution?



    Thanks,

    Manhee

    --




    --






    --




    --






    --




    --






    --




    --



    --
    Harsh J

    --


    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedJan 30, '13 at 5:49a
activeFeb 1, '13 at 12:44a
posts10
users3
websitecloudera.com
irc#hadoop

3 users in discussion

Manhee Jo: 6 posts Harold: 3 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase