FAQ
I'm using Cloudera's distribution of 0.20.1, but this seems like a general
question to I'm posting here.

I'm having some issues getting the Fair Scheduler setup. I followed the
basic instructions, from
http://hadoop.apache.org/common/docs/current/fair_scheduler.html:

* Added to mapred-site.xml:

<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>

<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>/etc/hadoop/conf/fairscheduler.xml</value>
</property>

The fair scheduler jar was already in the installation's root lib/

* Added the basic fairscheduler.xml, based on the example in the docs.

<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>${pool.name}</value>
<description>...</description>
</property>

<property>
<name>pool.name</name>
<value>${user.name}</value>
<description>...</description>
</property>

Running a job (say, one of the examples, such as the pi estimator, word
count, or sleep) and check myhost:50030/scheduler, I see the job listed in
the Pools table in the "hadoop" row, since that's the user. That makes
sense. In the Running Jobs table, the dropdown in the Pool column sometimes
shows "hadoop" and sometimes "default" when I reload the page, which is odd.

Then if I change the xml's pool.name entry's value to a hardcoded value, say
"foo", with a matching "foo" <pool> entry in the xml, and run a job (and
restart the JobTracker to be safe), I do see a "foo" row in the Pools table,
but it shows 0 Running Jobs, and "default" shows the one job. Also, the Pool
listed in the dropdown in the Running Jobs table remains "default", rather
than "foo" (although "foo" is a choice, and I CAN select it to change the
pool).

I'd expect that if I set the pool.name in fairscheduler.xml that jobs would
run, and appear, under that pool. Am I missing something in my setup or in
my understanding of how this should work? Thanks for any insight. What I'd
like to be able to do is set the pool name on the command line when running
a job, with an arg of "-Dpool.name=bar".

Thanks,
Derek

Search Discussions

  • Todd Lipcon at Dec 3, 2009 at 3:54 am
    Hi Derek,

    You should set poolnameproperty to "pool.name", not "${pool.name}"

    That should fix your issues.

    -Todd
    On Wed, Dec 2, 2009 at 7:46 PM, Derek Brown wrote:

    I'm using Cloudera's distribution of 0.20.1, but this seems like a general
    question to I'm posting here.

    I'm having some issues getting the Fair Scheduler setup. I followed the
    basic instructions, from
    http://hadoop.apache.org/common/docs/current/fair_scheduler.html:

    * Added to mapred-site.xml:

    <property>
    <name>mapred.jobtracker.taskScheduler</name>
    <value>org.apache.hadoop.mapred.FairScheduler</value>
    </property>

    <property>
    <name>mapred.fairscheduler.allocation.file</name>
    <value>/etc/hadoop/conf/fairscheduler.xml</value>
    </property>

    The fair scheduler jar was already in the installation's root lib/

    * Added the basic fairscheduler.xml, based on the example in the docs.

    <property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>${pool.name}</value>
    <description>...</description>
    </property>

    <property>
    <name>pool.name</name>
    <value>${user.name}</value>
    <description>...</description>
    </property>

    Running a job (say, one of the examples, such as the pi estimator, word
    count, or sleep) and check myhost:50030/scheduler, I see the job listed in
    the Pools table in the "hadoop" row, since that's the user. That makes
    sense. In the Running Jobs table, the dropdown in the Pool column sometimes
    shows "hadoop" and sometimes "default" when I reload the page, which is
    odd.

    Then if I change the xml's pool.name entry's value to a hardcoded value,
    say
    "foo", with a matching "foo" <pool> entry in the xml, and run a job (and
    restart the JobTracker to be safe), I do see a "foo" row in the Pools
    table,
    but it shows 0 Running Jobs, and "default" shows the one job. Also, the
    Pool
    listed in the dropdown in the Running Jobs table remains "default", rather
    than "foo" (although "foo" is a choice, and I CAN select it to change the
    pool).

    I'd expect that if I set the pool.name in fairscheduler.xml that jobs
    would
    run, and appear, under that pool. Am I missing something in my setup or in
    my understanding of how this should work? Thanks for any insight. What I'd
    like to be able to do is set the pool name on the command line when running
    a job, with an arg of "-Dpool.name=bar".

    Thanks,
    Derek

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 3, '09 at 3:47a
activeDec 3, '09 at 3:54a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Todd Lipcon: 1 post Derek Brown: 1 post

People

Translate

site design / logo © 2022 Grokbase