FAQ
Hello,
I'm running a 90 node c1.xlarge cluster. No reducers, mapred.max.map.tasks=6
per machine.
The AMI is own and uses Hadoop 0.19.1
The dataset has 145K keys, and the processing time is huge.

Now, when set the mapred.map.tasks=14,000 what ends up running is 49 map
tasks, across the machines.
No machine is running more than 3 tasks most are running 1, some are running
0.
Looking at the map records read, it appears these 49 tasks correspond to
the 145k records.
Q) Why? Why isn't the running tasks a much higher number? If each machine
can run 6, then why not make this a higher number and run across the
machines?
This is under utilization

So I set the mapred.map.tasks=90.
At the hadoop machine list, all 90 machines are at least 1 task , mostly 1,
some 2 and a small few 3+(max 4)
At the job tracker page, only 23 are running, 48 pending (when i sent this
email).
With 90 machines(and Map Task Capacity of 540), why aren't 90 running at
one go?

What should be set? What isn't set?

Regards
Saptarshi Guha

Search Discussions

  • Saptarshi Guha at Jun 23, 2009 at 2:50 pm
    Hello,
    I should also point out that I'm using a SequenceFileInputFormat.

    Regards
    Saptarshi Guha


    On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha
    wrote:
    Hello,
    I'm running a 90 node c1.xlarge cluster. No reducers,
    mapred.max.map.tasks=6 per machine.
    The AMI is own and uses Hadoop 0.19.1
    The dataset has 145K keys, and the processing time is huge.

    Now, when set the mapred.map.tasks=14,000 what ends up running is 49 map
    tasks, across the machines.
    No machine is running more than 3 tasks most are running 1, some are
    running 0.
    Looking at the map records read, it appears these 49 tasks correspond to
    the 145k records.
    Q) Why? Why isn't the running tasks a much higher number? If each machine
    can run 6, then why not make this a higher number and run across the
    machines?
    This is under utilization

    So I set the mapred.map.tasks=90.
    At the hadoop machine list, all 90 machines are at least 1 task , mostly 1,
    some 2 and a small few 3+(max 4)
    At the job tracker page, only 23 are running, 48 pending (when i sent this
    email).
    With 90 machines(and Map Task Capacity of 540), why aren't 90 running at
    one go?

    What should be set? What isn't set?

    Regards
    Saptarshi Guha
  • Hong Tang at Jun 23, 2009 at 5:25 pm
    Do you use block compression in sequence file? How large is your total
    dataset?
    On Jun 23, 2009, at 7:50 AM, Saptarshi Guha wrote:

    Hello,
    I should also point out that I'm using a SequenceFileInputFormat.

    Regards
    Saptarshi Guha


    On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha
    wrote:
    Hello,
    I'm running a 90 node c1.xlarge cluster. No reducers,
    mapred.max.map.tasks=6 per machine.
    The AMI is own and uses Hadoop 0.19.1
    The dataset has 145K keys, and the processing time is huge.

    Now, when set the mapred.map.tasks=14,000 what ends up running is
    49 map
    tasks, across the machines.
    No machine is running more than 3 tasks most are running 1, some are
    running 0.
    Looking at the map records read, it appears these 49 tasks
    correspond to
    the 145k records.
    Q) Why? Why isn't the running tasks a much higher number? If each
    machine
    can run 6, then why not make this a higher number and run across the
    machines?
    This is under utilization

    So I set the mapred.map.tasks=90.
    At the hadoop machine list, all 90 machines are at least 1 task ,
    mostly 1,
    some 2 and a small few 3+(max 4)
    At the job tracker page, only 23 are running, 48 pending (when i
    sent this
    email).
    With 90 machines(and Map Task Capacity of 540), why aren't 90
    running at
    one go?

    What should be set? What isn't set?

    Regards
    Saptarshi Guha

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 23, '09 at 2:44p
activeJun 23, '09 at 5:25p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Saptarshi Guha: 2 posts Hong Tang: 1 post

People

Translate

site design / logo © 2022 Grokbase