FAQ
Greetings, Hadoop Fans:

I'm attempting to use the timeout feature of the Fair Scheduler (using
Cloudera's most recently released distribution 0.20.1+152-1), but without
success. I'm using the following configs:

/etc/hadoop/conf/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:8021</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>9</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>3</value>
</property>
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>/etc/hadoop/conf/pools.xml</value>
</property>
<property>
<name>mapred.fairscheduler.assignmultiple</name>
<value>true</value>
</property>
<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>pool.name</value>
</property>
<property>
<name>pool.name</name>
<value>default</value>
</property>

</configuration>

and /etc/hadoop/conf/pools.xml

<?xml version="1.0"?>
<allocations>
<pool name="realtime">
<minMaps>4</minMaps>
<minReduces>1</minReduces>
<minSharePreemptionTimeout>180</minSharePreemptionTimeout>
<weight>2.0</weight>
</pool>
<pool name="default">
<minMaps>2</minMaps>
<minReduces>2</minReduces>
<maxRunningJobs>1</maxRunningJobs>
</pool>
</allocations>

but a job in the realtime pool fails to interrupt a job running in the
default queue (waited for > 15 minutes). Is there something wrong with my
configs? Or is there anything in the logs that would be useful for
debugging? (I've only found a "successfully configured fairscheduler"
comment in the jobtracker log upon starting up the daemon.)

Help would be extremely appreciated!

Thanks,
-James Warren

Search Discussions

  • James warren at Dec 3, 2009 at 12:56 am
    Todd from Cloudera solved this for me on their company's forum.

    "What you're missing is the "mapred.fairscheduler.preemption" property in
    mapred-site.xml - without this on, the preemption settings in the
    allocations file are ignored... to turn it on, set that property's value to
    'true'"

    Thanks, Todd!
    On Wed, Dec 2, 2009 at 4:26 PM, james warren wrote:

    Greetings, Hadoop Fans:

    I'm attempting to use the timeout feature of the Fair Scheduler (using
    Cloudera's most recently released distribution 0.20.1+152-1), but without
    success. I'm using the following configs:

    /etc/hadoop/conf/mapred-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>hadoop-master:8021</value>
    </property>
    <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>9</value>
    </property>
    <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>3</value>
    </property>
    <property>
    <name>mapred.jobtracker.taskScheduler</name>
    <value>org.apache.hadoop.mapred.FairScheduler</value>
    </property>
    <property>
    <name>mapred.fairscheduler.allocation.file</name>
    <value>/etc/hadoop/conf/pools.xml</value>
    </property>
    <property>
    <name>mapred.fairscheduler.assignmultiple</name>
    <value>true</value>
    </property>
    <property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>pool.name</value>
    </property>
    <property>
    <name>pool.name</name>
    <value>default</value>
    </property>

    </configuration>

    and /etc/hadoop/conf/pools.xml

    <?xml version="1.0"?>
    <allocations>
    <pool name="realtime">
    <minMaps>4</minMaps>
    <minReduces>1</minReduces>
    <minSharePreemptionTimeout>180</minSharePreemptionTimeout>
    <weight>2.0</weight>
    </pool>
    <pool name="default">
    <minMaps>2</minMaps>
    <minReduces>2</minReduces>
    <maxRunningJobs>1</maxRunningJobs>
    </pool>
    </allocations>

    but a job in the realtime pool fails to interrupt a job running in the
    default queue (waited for > 15 minutes). Is there something wrong with my
    configs? Or is there anything in the logs that would be useful for
    debugging? (I've only found a "successfully configured fairscheduler"
    comment in the jobtracker log upon starting up the daemon.)

    Help would be extremely appreciated!

    Thanks,
    -James Warren
  • Todd Lipcon at Dec 3, 2009 at 1:01 am
    No problem :) Also worth noting for anyone listening on that this feature is
    not in 0.20.1 - it's been backported into CDH. It will arrive in 0.21.

    Thanks
    -Todd
    On Wed, Dec 2, 2009 at 4:55 PM, james warren wrote:

    Todd from Cloudera solved this for me on their company's forum.

    "What you're missing is the "mapred.fairscheduler.preemption" property in
    mapred-site.xml - without this on, the preemption settings in the
    allocations file are ignored... to turn it on, set that property's value to
    'true'"

    Thanks, Todd!
    On Wed, Dec 2, 2009 at 4:26 PM, james warren wrote:

    Greetings, Hadoop Fans:

    I'm attempting to use the timeout feature of the Fair Scheduler (using
    Cloudera's most recently released distribution 0.20.1+152-1), but without
    success. I'm using the following configs:

    /etc/hadoop/conf/mapred-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>hadoop-master:8021</value>
    </property>
    <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>9</value>
    </property>
    <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>3</value>
    </property>
    <property>
    <name>mapred.jobtracker.taskScheduler</name>
    <value>org.apache.hadoop.mapred.FairScheduler</value>
    </property>
    <property>
    <name>mapred.fairscheduler.allocation.file</name>
    <value>/etc/hadoop/conf/pools.xml</value>
    </property>
    <property>
    <name>mapred.fairscheduler.assignmultiple</name>
    <value>true</value>
    </property>
    <property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>pool.name</value>
    </property>
    <property>
    <name>pool.name</name>
    <value>default</value>
    </property>

    </configuration>

    and /etc/hadoop/conf/pools.xml

    <?xml version="1.0"?>
    <allocations>
    <pool name="realtime">
    <minMaps>4</minMaps>
    <minReduces>1</minReduces>
    <minSharePreemptionTimeout>180</minSharePreemptionTimeout>
    <weight>2.0</weight>
    </pool>
    <pool name="default">
    <minMaps>2</minMaps>
    <minReduces>2</minReduces>
    <maxRunningJobs>1</maxRunningJobs>
    </pool>
    </allocations>

    but a job in the realtime pool fails to interrupt a job running in the
    default queue (waited for > 15 minutes). Is there something wrong with my
    configs? Or is there anything in the logs that would be useful for
    debugging? (I've only found a "successfully configured fairscheduler"
    comment in the jobtracker log upon starting up the daemon.)

    Help would be extremely appreciated!

    Thanks,
    -James Warren

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 3, '09 at 12:27a
activeDec 3, '09 at 1:01a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

James warren: 2 posts Todd Lipcon: 1 post

People

Translate

site design / logo © 2023 Grokbase