FAQ
I'm new to setting up hadoop's scheduler and i'm trying to set up
Fairscheduler on a 3-node cluster. The initial setup is fine but
throughput is abysmal.

Each node is configured with 16 map task capacity and 8 reduce task
capacity. Most jobs being run are reading data from cassandra installed
on the same nodes using ColumnFamilyInputFormat.

With the default scheduler these jobs take from 5 to 15 minutes.

When i plug in the fairscheduler they take from one to many hours.

What i see is that the map task capacity is not being used. Jobs now
only run 3 map tasks at a time whereas before they would always run all
48 map tasks.

This is without any custom fair-scheduler.xml configuration. But i've also
tried configuring userMaxJobsDefault, maxRunningJobs, and weight
without any luck.

I've also tried adding mapred.fairscheduler.locality.delay=0 without any
luck.

Is it possible with fairscheduler to get the same throughput when only
one job is running as it is with hadoop's default scheduler? Am i
missing something obvious?

~mck


--
Linux, because I'd rather own a free OS than steal one that's not worth
paying for.

Search Discussions

  • Matei Zaharia at Aug 18, 2011 at 7:34 pm
    How long are your tasks, and which version of Hadoop are you using? In older versions (0.20.*), the fair scheduler doesn't launch multiple tasks per heartbeat, so it performs poorly when your tasks are small (less than 5-10 seconds). You may be able to improve it a bit by setting mapred.fairscheduler.assignmultiple to true in your mapred-site.xml. However, even this will assign too few tasks per heartbeat. I recommend using either Hadoop 0.21, where this issue is fixed by default, or Cloudera's Hadoop distribution, which is the only 0.20-based version that has backported the relevant fair scheduler improvements from 0.21.

    If you have short tasks though, you should beware that Hadoop as a whole will be inefficient because it will spend most of its time launching JVMs and waiting on heartbeats to send back status updates. You should try to tune your task size (number of input records per task) so that each tasks takes at least 30-60 seconds, or you won't be running at the maximum efficiency possible for your cluster.

    Matei
    On Aug 18, 2011, at 10:14 AM, Mick Semb Wever wrote:

    I'm new to setting up hadoop's scheduler and i'm trying to set up
    Fairscheduler on a 3-node cluster. The initial setup is fine but
    throughput is abysmal.

    Each node is configured with 16 map task capacity and 8 reduce task
    capacity. Most jobs being run are reading data from cassandra installed
    on the same nodes using ColumnFamilyInputFormat.

    With the default scheduler these jobs take from 5 to 15 minutes.

    When i plug in the fairscheduler they take from one to many hours.

    What i see is that the map task capacity is not being used. Jobs now
    only run 3 map tasks at a time whereas before they would always run all
    48 map tasks.

    This is without any custom fair-scheduler.xml configuration. But i've also
    tried configuring userMaxJobsDefault, maxRunningJobs, and weight
    without any luck.

    I've also tried adding mapred.fairscheduler.locality.delay=0 without any
    luck.

    Is it possible with fairscheduler to get the same throughput when only
    one job is running as it is with hadoop's default scheduler? Am i
    missing something obvious?

    ~mck


    --
    Linux, because I'd rather own a free OS than steal one that's not worth
    paying for.
  • Mck at Aug 18, 2011 at 9:33 pm
    How long are your tasks, and which version of Hadoop are you using?
    Hadoop-0.20.1 (eventually we're looking to upgrade to Brisk).
    Tasks take 5-30 seconds.
    In older versions (0.20.*), the fair
    scheduler doesn't launch multiple tasks per heartbeat, so it performs poorly when your tasks are small
    (less than 5-10 seconds). You may be able to improve it a bit by setting
    mapred.fairscheduler.assignmultiple to true in your mapred-site.xml. However, even this will assign
    too few tasks per heartbeat. I recommend using either Hadoop 0.21, where this issue is fixed by default, or
    Cloudera's Hadoop distribution, which is the only 0.20-based version that has backported the relevant
    fair scheduler improvements from 0.21.
    mapred.fairscheduler.assignmultiple gave no apparent benefit. Maps running is still too low (4 or 5 now).

    [OT] Does Brisk-1.0-beta2 include this fix you mention?
    If you have short tasks though, you should beware that Hadoop as a whole will be inefficient because it will
    spend most of its time launching JVMs and waiting on heartbeats to send back status updates.
    I'm running with mapred.job.reuse.jvm.num.tasks=-1 so to re-use all JVMs.
    You should try
    to tune your task size (number of input records per task) so that each tasks takes at least 30-60 seconds, or
    you won't be running at the maximum efficiency possible for your cluster.
    The default here is
    cassandra.input.split.size=65536

    Raising it to 262144 (default x 4) indeed fixes the problem :-)
    (Now i just need to check this doesn't send memory through the roof...)


    Thanks for your valuable and quick help Matei.


    ~mck


    --
    "Physics is to math what sex is to masturbation." Richard Feynman
  • Matei Zaharia at Aug 18, 2011 at 9:41 pm
    Okay, great!

    Unfortunately the task launching is still slow even if you have JVM reuse set to -1 because of heartbeats (the slave node only updates its state with the master every ~5 seconds).

    Matei
    On Aug 18, 2011, at 5:25 PM, Mck wrote:

    How long are your tasks, and which version of Hadoop are you using?
    Hadoop-0.20.1 (eventually we're looking to upgrade to Brisk).
    Tasks take 5-30 seconds.
    In older versions (0.20.*), the fair
    scheduler doesn't launch multiple tasks per heartbeat, so it performs poorly when your tasks are small
    (less than 5-10 seconds). You may be able to improve it a bit by setting
    mapred.fairscheduler.assignmultiple to true in your mapred-site.xml. However, even this will assign
    too few tasks per heartbeat. I recommend using either Hadoop 0.21, where this issue is fixed by default, or
    Cloudera's Hadoop distribution, which is the only 0.20-based version that has backported the relevant
    fair scheduler improvements from 0.21.
    mapred.fairscheduler.assignmultiple gave no apparent benefit. Maps running is still too low (4 or 5 now).

    [OT] Does Brisk-1.0-beta2 include this fix you mention?
    If you have short tasks though, you should beware that Hadoop as a whole will be inefficient because it will
    spend most of its time launching JVMs and waiting on heartbeats to send back status updates.
    I'm running with mapred.job.reuse.jvm.num.tasks=-1 so to re-use all JVMs.
    You should try
    to tune your task size (number of input records per task) so that each tasks takes at least 30-60 seconds, or
    you won't be running at the maximum efficiency possible for your cluster.
    The default here is
    cassandra.input.split.size=65536

    Raising it to 262144 (default x 4) indeed fixes the problem :-)
    (Now i just need to check this doesn't send memory through the roof...)


    Thanks for your valuable and quick help Matei.


    ~mck


    --
    "Physics is to math what sex is to masturbation." Richard Feynman
  • Arun C Murthy at Aug 18, 2011 at 9:44 pm
    0.20.203 fixed the TT to be more aggressive about heartbeats, but not overtly so - that should help a lot.

    Arun
    On Aug 18, 2011, at 2:25 PM, Mck wrote:

    How long are your tasks, and which version of Hadoop are you using?
    Hadoop-0.20.1 (eventually we're looking to upgrade to Brisk).
    Tasks take 5-30 seconds.
    In older versions (0.20.*), the fair
    scheduler doesn't launch multiple tasks per heartbeat, so it performs poorly when your tasks are small
    (less than 5-10 seconds). You may be able to improve it a bit by setting
    mapred.fairscheduler.assignmultiple to true in your mapred-site.xml. However, even this will assign
    too few tasks per heartbeat. I recommend using either Hadoop 0.21, where this issue is fixed by default, or
    Cloudera's Hadoop distribution, which is the only 0.20-based version that has backported the relevant
    fair scheduler improvements from 0.21.
    mapred.fairscheduler.assignmultiple gave no apparent benefit. Maps running is still too low (4 or 5 now).

    [OT] Does Brisk-1.0-beta2 include this fix you mention?
    If you have short tasks though, you should beware that Hadoop as a whole will be inefficient because it will
    spend most of its time launching JVMs and waiting on heartbeats to send back status updates.
    I'm running with mapred.job.reuse.jvm.num.tasks=-1 so to re-use all JVMs.
    You should try
    to tune your task size (number of input records per task) so that each tasks takes at least 30-60 seconds, or
    you won't be running at the maximum efficiency possible for your cluster.
    The default here is
    cassandra.input.split.size=65536

    Raising it to 262144 (default x 4) indeed fixes the problem :-)
    (Now i just need to check this doesn't send memory through the roof...)


    Thanks for your valuable and quick help Matei.


    ~mck


    --
    "Physics is to math what sex is to masturbation." Richard Feynman

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 18, '11 at 6:23p
activeAug 18, '11 at 9:44p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase