FAQ
Hi all,

I am experimenting with fair scheduler in a cluster of 10 machines. The
users are given default values("0") for minMaps and minReduces in fair
scheduler parameters. When I tried to run two jobs using the same
username, the fair scheduler is giving 100% fair share to first
job(needs 2 mappers) and the second job(needs10 mappers) is in waiting
mode though the cluster is totally idle. Allowing these jobs to run
simultaneously would take only 10% of total available mappers. However,
the second job is not allowed to run till the first job is over. It
would be great if some one can suggest some parameter tuning which can
allow efficient utilization of cluster. Efficient I mean, allowing jobs
to run when the cluster is idle rather letting them in waiting mode. I
am not sure whether setting "minMaps, minReduces" for each user would
resolve the issue. Kindly clarify.

Thanks
Pallavi

Search Discussions

  • Todd Lipcon at Jan 15, 2010 at 5:19 am
    Hi Pallavi,

    This doesn't sound right. Can you visit
    http://jobtracker:50030/scheduler?advanced and maybe send a screenshot? And
    also upload the allocations.xml file you're using?

    It sounds like you've managed to set either userMaxJobsDefault or
    maxRunningJobs for that user to 1.

    -Todd
    On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti wrote:

    Hi all,

    I am experimenting with fair scheduler in a cluster of 10 machines. The
    users are given default values("0") for minMaps and minReduces in fair
    scheduler parameters. When I tried to run two jobs using the same username,
    the fair scheduler is giving 100% fair share to first job(needs 2 mappers)
    and the second job(needs10 mappers) is in waiting mode though the cluster is
    totally idle. Allowing these jobs to run simultaneously would take only 10%
    of total available mappers. However, the second job is not allowed to run
    till the first job is over. It would be great if some one can suggest some
    parameter tuning which can allow efficient utilization of cluster. Efficient
    I mean, allowing jobs to run when the cluster is idle rather letting them in
    waiting mode. I am not sure whether setting "minMaps, minReduces" for each
    user would resolve the issue. Kindly clarify.

    Thanks
    Pallavi
  • Pallavi Palleti at Jan 15, 2010 at 9:16 am
    Hi Todd,

    Thanks for the reply. I figured out that *userMaxJobsDefault*** was set
    to 1. I have another query regarding the same. What will happen if I
    remove *userMaxJobsDefault *property? What is the default value? Would
    setting a value higher than 1 for a particular user leads other users'
    jobs to stall till these jobs get over? If so, is there a way where we
    can set that, a user can take at max some percentage of total idle
    mappers existing at that time? And, if the threshold exceeds, we can let
    users to run only some defaults number of jobs at a time? This way, we
    can avoid stalling other users' jobs and also efficiently utilize the
    cluster. Kindly clarify.

    Thanks
    Pallavi


    Todd Lipcon wrote:
    Hi Pallavi,

    This doesn't sound right. Can you visit
    http://jobtracker:50030/scheduler?advanced and maybe send a
    screenshot? And also upload the allocations.xml file you're using?

    It sounds like you've managed to set either userMaxJobsDefault or
    maxRunningJobs for that user to 1.

    -Todd

    On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti
    wrote:

    Hi all,

    I am experimenting with fair scheduler in a cluster of 10
    machines. The users are given default values("0") for minMaps and
    minReduces in fair scheduler parameters. When I tried to run two
    jobs using the same username, the fair scheduler is giving 100%
    fair share to first job(needs 2 mappers) and the second
    job(needs10 mappers) is in waiting mode though the cluster is
    totally idle. Allowing these jobs to run simultaneously would take
    only 10% of total available mappers. However, the second job is
    not allowed to run till the first job is over. It would be great
    if some one can suggest some parameter tuning which can allow
    efficient utilization of cluster. Efficient I mean, allowing jobs
    to run when the cluster is idle rather letting them in waiting
    mode. I am not sure whether setting "minMaps, minReduces" for each
    user would resolve the issue. Kindly clarify.

    Thanks
    Pallavi
  • Todd Lipcon at Jan 15, 2010 at 4:18 pm
    Hi Pallavi,

    If you remove userMaxJobsDefault, the default value is Integer.MAX_VALUE -
    that is, it's unconstrained by this limit. This means that the other limits
    and fair sharing would kick in if multiple jobs are submitted. So, if you
    haven't set any of the min-slots, and the jobs are all at the same priority,
    they'll share the number of slots equally. Please check out the fair
    scheduler documentation in docs/fair_scheduler.pdf in your distro.

    -Todd
    On Fri, Jan 15, 2010 at 1:15 AM, Pallavi Palleti wrote:

    Hi Todd,

    Thanks for the reply. I figured out that *userMaxJobsDefault*** was set to
    1. I have another query regarding the same. What will happen if I remove *userMaxJobsDefault
    *property? What is the default value? Would setting a value higher than 1
    for a particular user leads other users' jobs to stall till these jobs get
    over? If so, is there a way where we can set that, a user can take at max
    some percentage of total idle mappers existing at that time? And, if the
    threshold exceeds, we can let users to run only some defaults number of jobs
    at a time? This way, we can avoid stalling other users' jobs and also
    efficiently utilize the cluster. Kindly clarify.

    Thanks
    Pallavi



    Todd Lipcon wrote:

    Hi Pallavi,

    This doesn't sound right. Can you visit
    http://jobtracker:50030/scheduler?advanced and maybe send a screenshot?
    And also upload the allocations.xml file you're using?

    It sounds like you've managed to set either userMaxJobsDefault or
    maxRunningJobs for that user to 1.

    -Todd

    On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti <
    pallavi.palleti@corp.aol.com> wrote:
    Hi all,

    I am experimenting with fair scheduler in a cluster of 10 machines. The
    users are given default values("0") for minMaps and minReduces in fair
    scheduler parameters. When I tried to run two jobs using the same username,
    the fair scheduler is giving 100% fair share to first job(needs 2 mappers)
    and the second job(needs10 mappers) is in waiting mode though the cluster is
    totally idle. Allowing these jobs to run simultaneously would take only 10%
    of total available mappers. However, the second job is not allowed to run
    till the first job is over. It would be great if some one can suggest some
    parameter tuning which can allow efficient utilization of cluster. Efficient
    I mean, allowing jobs to run when the cluster is idle rather letting them in
    waiting mode. I am not sure whether setting "minMaps, minReduces" for each
    user would resolve the issue. Kindly clarify.

    Thanks
    Pallavi
  • Pallavi Palleti at Jan 18, 2010 at 5:06 am
    Thanks Todd. I have gone through the documentation earlier. However,
    these things were not very clear. This will help me in experimenting
    further. Thanks for the information. :-)

    Regards
    Pallavi

    Todd Lipcon wrote:
    Hi Pallavi,

    If you remove userMaxJobsDefault, the default value is
    Integer.MAX_VALUE - that is, it's unconstrained by this limit. This
    means that the other limits and fair sharing would kick in if multiple
    jobs are submitted. So, if you haven't set any of the min-slots, and
    the jobs are all at the same priority, they'll share the number of
    slots equally. Please check out the fair scheduler documentation in
    docs/fair_scheduler.pdf in your distro.

    -Todd

    On Fri, Jan 15, 2010 at 1:15 AM, Pallavi Palleti
    wrote:

    Hi Todd,

    Thanks for the reply. I figured out that *userMaxJobsDefault* was
    set to 1. I have another query regarding the same. What will
    happen if I remove *userMaxJobsDefault *property? What is the
    default value? Would setting a value higher than 1 for a
    particular user leads other users' jobs to stall till these jobs
    get over? If so, is there a way where we can set that, a user can
    take at max some percentage of total idle mappers existing at that
    time? And, if the threshold exceeds, we can let users to run only
    some defaults number of jobs at a time? This way, we can avoid
    stalling other users' jobs and also efficiently utilize the
    cluster. Kindly clarify.

    Thanks
    Pallavi



    Todd Lipcon wrote:
    Hi Pallavi,

    This doesn't sound right. Can you visit
    http://jobtracker:50030/scheduler?advanced and maybe send a
    screenshot? And also upload the allocations.xml file you're using?

    It sounds like you've managed to set either userMaxJobsDefault or
    maxRunningJobs for that user to 1.

    -Todd

    On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti
    <pallavi.palleti@corp.aol.com
    wrote:

    Hi all,

    I am experimenting with fair scheduler in a cluster of 10
    machines. The users are given default values("0") for minMaps
    and minReduces in fair scheduler parameters. When I tried to
    run two jobs using the same username, the fair scheduler is
    giving 100% fair share to first job(needs 2 mappers) and the
    second job(needs10 mappers) is in waiting mode though the
    cluster is totally idle. Allowing these jobs to run
    simultaneously would take only 10% of total available
    mappers. However, the second job is not allowed to run till
    the first job is over. It would be great if some one can
    suggest some parameter tuning which can allow efficient
    utilization of cluster. Efficient I mean, allowing jobs to
    run when the cluster is idle rather letting them in waiting
    mode. I am not sure whether setting "minMaps, minReduces" for
    each user would resolve the issue. Kindly clarify.

    Thanks
    Pallavi

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJan 15, '10 at 5:06a
activeJan 18, '10 at 5:06a
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Pallavi Palleti: 3 posts Todd Lipcon: 2 posts

People

Translate

site design / logo © 2021 Grokbase