Grokbase Groups Pig user October 2010
FAQ
Hi All,

I'm quite new to Pig/Hadoop. So maybe my cluster size will make you laugh.

I wrote a script on Pig handling 1.5GB of logs in less than one hour in
pig local mode on a Intel core 2 duo with 3GB of RAM.

Then I tried this script on a simple 2 nodes cluster. These 2 nodes are
not servers but simple computers:
- Intel core 2 duo with 3GB of RAM.
- Intel Quad with 4GB of RAM.

Well I was aware that hadoop has overhead and that it won't be done in
half an hour (time in local divided by number of nodes). But I was
surprised to see this morning it took 7 hours to complete!!!

My configuration was made according to this link:
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

My question is simple: Is it normal?

Cheers


Vincent

Search Discussions

  • Dmitriy Ryaboy at Oct 8, 2010 at 7:26 am
    What version of Pig, and what does your script look like?
    On Thu, Oct 7, 2010 at 11:48 PM, Vincent wrote:

    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one hour in pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes are not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done in half
    an hour (time in local divided by number of nodes). But I was surprised to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:

    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent
  • Vincent at Oct 8, 2010 at 7:36 am
    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincentwrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one hour in pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes are not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done in half
    an hour (time in local divided by number of nodes). But I was surprised to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:

    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent
  • Vincent at Oct 8, 2010 at 8:49 am
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big
    Pig script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:
    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post
    it here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one hour
    in pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes
    are not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done
    in half
    an hour (time in local divided by number of nodes). But I was
    surprised to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:

    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29


    My question is simple: Is it normal?

    Cheers


    Vincent
  • Jeff Zhang at Oct 8, 2010 at 9:01 am
    I guess maybe your reduce number is 1 which cause the reduce phase very slowly.


    On Fri, Oct 8, 2010 at 4:44 PM, Vincent wrote:
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one hour in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done in
    half
    an hour (time in local divided by number of nodes). But I was surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:


    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent


    --
    Best Regards

    Jeff Zhang
  • Vincent at Oct 8, 2010 at 9:09 am
    You are right, I didn't change this parameter, therefore the default
    is used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set
    to 99%
    of the cluster's reduce capacity, so that if a node fails the reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:
    I guess maybe your reduce number is 1 which cause the reduce phase very slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincentwrote:
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:
    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one hour in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done in
    half
    an hour (time in local divided by number of nodes). But I was surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:


    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent
  • Jeff Zhang at Oct 8, 2010 at 9:12 am
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really understand
    your problem.


    On Fri, Oct 8, 2010 at 5:04 PM, Vincent wrote:
    You are right, I didn't change this parameter, therefore the default is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set to
    99%
    of the cluster's reduce capacity, so that if a node fails the reduces can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce phase very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:



    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Oct 8, 2010 at 9:13 am
    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time


    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang wrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really understand
    your problem.


    On Fri, Oct 8, 2010 at 5:04 PM, Vincent wrote:
    You are right, I didn't change this parameter, therefore the default is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set to
    99%
    of the cluster's reduce capacity, so that if a node fails the reduces can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce phase very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:



    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Vincent at Oct 8, 2010 at 11:00 am
    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see for
    the map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7 <http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task Complete Status Start Time Finish Time Errors Counters
    task_201010081314_0010_m_000000
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>



    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7 <http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task Complete Status Start Time Finish Time Errors Counters
    task_201010081314_0010_r_000000
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    9.72%



    reduce > copy (7 of 24 at 0.01 MB/s) >
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    task_201010081314_0010_r_000001
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space



    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    task_201010081314_0010_r_000002
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    task_201010081314_0010_r_000003
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    9.72%



    reduce > copy (7 of 24 at 0.01 MB/s) >
    8-Oct-2010 14:16:58


    7
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    task_201010081314_0010_r_000004
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded



    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    task_201010081314_0010_r_000005
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:
    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhangwrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincentwrote:
    You are right, I didn't change this parameter, therefore the default is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set to
    99%
    of the cluster's reduce capacity, so that if a node fails the reduces can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:
    I guess maybe your reduce number is 1 which cause the reduce phase very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:
    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:



    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent

    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Oct 8, 2010 at 11:05 am
    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.


    On Fri, Oct 8, 2010 at 6:55 PM, Vincent wrote:



    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see for the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7 <http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task    Complete        Status  Start Time      Finish Time     Errors
    Counters
    task_201010081314_0010_m_000000
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>


    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7 <http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task    Complete        Status  Start Time      Finish Time     Errors
    Counters
    task_201010081314_0010_r_000000
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    9.72%



    reduce > copy (7 of 24 at 0.01 MB/s) >
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    task_201010081314_0010_r_000001
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space



    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    task_201010081314_0010_r_000002
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    task_201010081314_0010_r_000003
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    9.72%



    reduce > copy (7 of 24 at 0.01 MB/s) >
    8-Oct-2010 14:16:58


    7
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    task_201010081314_0010_r_000004
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded



    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    task_201010081314_0010_r_000005
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:

    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhangwrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    You are right, I didn't change this parameter, therefore the default is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce phase very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big
    Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:




    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent

    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Oct 8, 2010 at 11:16 am
    Vincent,

    Just want to remind you that you need to restart your cluster after
    the reconfiguration.


    On Fri, Oct 8, 2010 at 7:04 PM, Jeff Zhang wrote:
    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.


    On Fri, Oct 8, 2010 at 6:55 PM, Vincent wrote:



    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see for the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7 <http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task    Complete        Status  Start Time      Finish Time     Errors
    Counters
    task_201010081314_0010_m_000000
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>


    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7 <http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task    Complete        Status  Start Time      Finish Time     Errors
    Counters
    task_201010081314_0010_r_000000
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    9.72%



    reduce > copy (7 of 24 at 0.01 MB/s) >
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    task_201010081314_0010_r_000001
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space



    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    task_201010081314_0010_r_000002
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    task_201010081314_0010_r_000003
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    9.72%



    reduce > copy (7 of 24 at 0.01 MB/s) >
    8-Oct-2010 14:16:58


    7
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    task_201010081314_0010_r_000004
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded



    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    task_201010081314_0010_r_000005
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:

    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhangwrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    You are right, I didn't change this parameter, therefore the default is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce phase very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big
    Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:




    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent

    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Vincent at Oct 8, 2010 at 2:04 pm
    I've tried with mapred.child.java.opts value:
    -Xmx512m --> still memory errors in reduce phase
    -Xmx1024m --> still memory errors in reduce phase
    I am now trying with -Xmx1536m but I'm afraid that my nodes will start
    to swap memory...

    Should I continue in this direction? Or it's already to much and I
    should search the problem somewhere else?

    Thanks

    -Vincent

    On 10/08/2010 03:04 PM, Jeff Zhang wrote:
    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.



    On Fri, Oct 8, 2010 at 6:55 PM, Vincentwrote:

    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see for the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task Complete Status Start Time Finish Time Errors
    Counters
    task_201010081314_0010_m_000000
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>


    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task Complete Status Start Time Finish Time Errors
    Counters
    task_201010081314_0010_r_000000
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    task_201010081314_0010_r_000001
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space



    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    task_201010081314_0010_r_000002
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    task_201010081314_0010_r_000003
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:16:58


    7
    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    task_201010081314_0010_r_000004
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded



    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    task_201010081314_0010_r_000005
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:
    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhangwrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    You are right, I didn't change this parameter, therefore the default is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:
    I guess maybe your reduce number is 1 which cause the reduce phase very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one big
    Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:
    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2 nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:




    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent
    --
    Best Regards

    Jeff Zhang
  • Jeff Zhang at Oct 8, 2010 at 2:42 pm
    Did you restart cluster after reconfiguration ?

    On Fri, Oct 8, 2010 at 9:59 PM, Vincent wrote:
    I've tried with mapred.child.java.opts value:
    -Xmx512m --> still memory errors in reduce phase
    -Xmx1024m --> still memory errors in reduce phase
    I am now trying with -Xmx1536m but I'm afraid that my nodes will start to
    swap memory...

    Should I continue in this direction? Or it's already to much and I should
    search the problem somewhere else?

    Thanks

    -Vincent

    On 10/08/2010 03:04 PM, Jeff Zhang wrote:

    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.



    On Fri, Oct 8, 2010 at 6:55 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:

    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see for
    the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>  on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task    Complete        Status  Start Time      Finish Time     Errors
    Counters
    task_201010081314_0010_m_000000

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>


    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>  on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task    Complete        Status  Start Time      Finish Time     Errors
    Counters
    task_201010081314_0010_r_000000

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    9.72%



    reduce>  copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    task_201010081314_0010_r_000001

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    task_201010081314_0010_r_000002

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)





    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    task_201010081314_0010_r_000003

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    9.72%



    reduce>  copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:16:58


    7

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    task_201010081314_0010_r_000004

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    task_201010081314_0010_r_000005

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:

    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhangwrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    You are right, I didn't change this parameter, therefore the default
    is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce phase
    very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done
    quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one
    big
    Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I
    post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM,
    Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make
    you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2
    nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be
    done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:





    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent
    --
    Best Regards

    Jeff Zhang


    --
    Best Regards

    Jeff Zhang
  • Vincent at Oct 8, 2010 at 2:49 pm
    Yep, I did restart cluster (dfs and mapred stop/start).

    Increasing the amount of memory I can see that the reduce task goes
    further (percentage is greater), but then start to decrease with memory
    failures.
    On 10/08/2010 06:41 PM, Jeff Zhang wrote:
    Did you restart cluster after reconfiguration ?


    On Fri, Oct 8, 2010 at 9:59 PM, Vincentwrote:
    I've tried with mapred.child.java.opts value:
    -Xmx512m --> still memory errors in reduce phase
    -Xmx1024m --> still memory errors in reduce phase
    I am now trying with -Xmx1536m but I'm afraid that my nodes will start to
    swap memory...

    Should I continue in this direction? Or it's already to much and I should
    search the problem somewhere else?

    Thanks

    -Vincent

    On 10/08/2010 03:04 PM, Jeff Zhang wrote:
    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.



    On Fri, Oct 8, 2010 at 6:55 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see for
    the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task Complete Status Start Time Finish Time Errors
    Counters
    task_201010081314_0010_m_000000

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>
    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>


    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010> on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------


    All Tasks

    Task Complete Status Start Time Finish Time Errors
    Counters
    task_201010081314_0010_r_000000

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>
    task_201010081314_0010_r_000001

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>
    task_201010081314_0010_r_000002

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)





    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>
    task_201010081314_0010_r_000003

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:16:58


    7

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>
    task_201010081314_0010_r_000004

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>
    task_201010081314_0010_r_000005

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>
    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:
    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhangwrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    You are right, I didn't change this parameter, therefore the default
    is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:
    I guess maybe your reduce number is 1 which cause the reduce phase
    very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done
    quite
    quickly expect 2 for which reduce phase goes really really slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one
    big
    Pig
    script? I did one big to avoid to load several time the same logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:
    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I
    post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM,
    Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will make
    you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2
    nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be
    done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:





    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent
    --
    Best Regards

    Jeff Zhang
  • Vincent at Oct 8, 2010 at 3:52 pm
    *I've tried mapred.child.java.opts value 2048m*. Now the error is a
    timeout. Seems like system is so loaded that it becomes irresponsive...

    Here are the outputs of the job tracker:


    Hadoop job_201010081840_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010>
    failures on prog7 <http://prog7.lan:50030/jobtracker.jsp>

    Attempt Task Machine State Error Logs
    attempt_201010081840_0010_r_000001_0 task_201010081840_0010_r_000001
    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001>
    prog7 <http://prog7:50060> FAILED

    Task attempt_201010081840_0010_r_000001_0 failed to report status for 601 seconds. Killing!

    Last 4KB
    <http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097>
    Last 8KB
    <http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193>
    All
    <http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0>



    ask Logs: 'attempt_201010081840_0010_r_000001_0'



    *_stdout logs_*
    ------------------------------------------------------------------------


    *_stderr logs_*
    ------------------------------------------------------------------------


    *_syslog logs_*

    2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId=
    2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=1336252800, MaxSingleShuffleLimit=334063200
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging on-disk files
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging on-disk files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging in memory files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where 0 is already in progress
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling Map Completion Events
    2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs
    2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000002_0, compressed len: 18158866, decompressed len: 18158862
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18158862 bytes (18158866 raw bytes) into RAM from attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000000_0, compressed len: 20624287, decompressed len: 20624283
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 20624283 bytes (20624287 raw bytes) into RAM from attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read 20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000000_0 -> (105, 265) from prog7
    2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and9 dup hosts)
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000001_0, compressed len: 18485340, decompressed len: 18485336
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18485336 bytes (18485340 raw bytes) into RAM from attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read 18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000002_0 -> (177, 148) from hermitage
    2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000003_0, compressed len: 18075620, decompressed len: 18075616
    2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18075616 bytes (18075620 raw bytes) into RAM from attempt_201010081840_0010_m_000003_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read 18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000001_0 -> (241, 162) from prog7
    2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read 18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0
    2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000003_0 -> (189, 187) from hermitage
    2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000006_0, compressed len: 18255983, decompressed len: 18255979
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18255979 bytes (18255983 raw bytes) into RAM from attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000004_0, compressed len: 18874529, decompressed len: 18874525
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18874525 bytes (18874529 raw bytes) into RAM from attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and7 dup hosts)
    2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read 18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000004_0 -> (105, 133) from prog7
    2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read 18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000006_0 -> (105, 178) from hermitage
    2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000007_0, compressed len: 18358512, decompressed len: 18358508
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18358508 bytes (18358512 raw bytes) into RAM from attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read 18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000007_0 -> (105, 166) from hermitage
    2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000008_0, compressed len: 18092007, decompressed len: 18092003
    2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18092003 bytes (18092007 raw bytes) into RAM from attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read 18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000008_0 -> (293, 232) from hermitage
    2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000009_0, compressed len: 17941909, decompressed len: 17941905
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 17941905 bytes (17941909 raw bytes) into RAM from attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs
    2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and6 dup hosts)
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read 17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000009_0 -> (105, 197) from hermitage
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000010_0, compressed len: 18405142, decompressed len: 18405138
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18405138 bytes (18405142 raw bytes) into RAM from attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000005_0, compressed len: 18009096, decompressed len: 18009092
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18009092 bytes (18009096 raw bytes) into RAM from attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read 18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000005_0 -> (105, 206) from prog7
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read 18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000010_0 -> (137, 175) from hermitage
    2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000011_0, compressed len: 20002825, decompressed len: 20002821
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 20002821 bytes (20002825 raw bytes) into RAM from attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read 20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000011_0 -> (105, 143) from hermitage
    2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000012_0, compressed len: 18135959, decompressed len: 18135955
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18135955 bytes (18135959 raw bytes) into RAM from attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read 18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000012_0 -> (137, 149) from hermitage
    2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000013_0, compressed len: 18440786, decompressed len: 18440782
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18440782 bytes (18440786 raw bytes) into RAM from attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read 18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000013_0 -> (137, 142) from hermitage
    2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000014_0, compressed len: 18205885, decompressed len: 18205881
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18205881 bytes (18205885 raw bytes) into RAM from attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read 18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000014_0 -> (253, 159) from hermitage
    2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000015_0, compressed len: 17476262, decompressed len: 17476258
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 17476258 bytes (17476262 raw bytes) into RAM from attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read 17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000015_0 -> (105, 158) from hermitage
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000017_0, compressed len: 18542230, decompressed len: 18542226
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18542226 bytes (18542230 raw bytes) into RAM from attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read 18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000017_0 -> (257, 151) from hermitage
    2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000018_0, compressed len: 18737340, decompressed len: 18737336
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18737336 bytes (18737340 raw bytes) into RAM from attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read 18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000018_0 -> (253, 175) from hermitage
    2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000020_0, compressed len: 17710258, decompressed len: 17710254
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 17710254 bytes (17710258 raw bytes) into RAM from attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000016_0, compressed len: 20708576, decompressed len: 20708572
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 20708572 bytes (20708576 raw bytes) into RAM from attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read 20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000016_0 -> (297, 318) from prog7
    2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and1 dup hosts)
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000019_0, compressed len: 18984487, decompressed len: 18984483
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18984483 bytes (18984487 raw bytes) into RAM from attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read 18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000019_0 -> (285, 160) from prog7
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read 17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000020_0 -> (105, 127) from hermitage
    2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000021_0, compressed len: 18803713, decompressed len: 18803709
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18803709 bytes (18803713 raw bytes) into RAM from attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read 18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000021_0 -> (137, 164) from hermitage
    2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000022_1, compressed len: 18143868, decompressed len: 18143864
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 18143864 bytes (18143868 raw bytes) into RAM from attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read 18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000022_1 -> (105, 133) from hermitage
    2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201010081840_0010_m_000023_1, compressed len: 9198819, decompressed len: 9198815
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 9198815 bytes (9198819 raw bytes) into RAM from attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output of KILLED map-task: 'attempt_201010081840_0010_m_000022_0'
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read 9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_201010081840_0010_m_000023_1 -> (137, 166) from hermitage
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined.
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 0 files left.
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 24 files left.
    2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24 sorted segments
    2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 24 segments left of total size: 436372203 bytes
    2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged 24 segments, 436372203 bytes to disk to satisfy reduce memory limit
    2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 files, 436372161 bytes from disk
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
    2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 436372157 bytes
    2010-10-08 19:13:18,879 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to process : 24
    2010-10-08 19:13:18,879 INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input paths to process : 24
    2010-10-08 19:16:14,354 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called (Collection threshold exceeded) init = 32309248(31552K) used = 803560952(784727K) committed = 1069678592(1044608K) max = 1431699456(1398144K)


    On 10/08/2010 06:44 PM, Vincent wrote:
    Yep, I did restart cluster (dfs and mapred stop/start).

    Increasing the amount of memory I can see that the reduce task goes
    further (percentage is greater), but then start to decrease with
    memory failures.
    On 10/08/2010 06:41 PM, Jeff Zhang wrote:
    Did you restart cluster after reconfiguration ?


    On Fri, Oct 8, 2010 at 9:59 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    I've tried with mapred.child.java.opts value:
    -Xmx512m --> still memory errors in reduce phase
    -Xmx1024m --> still memory errors in reduce phase
    I am now trying with -Xmx1536m but I'm afraid that my nodes will
    start to
    swap memory...

    Should I continue in this direction? Or it's already to much and I
    should
    search the problem somewhere else?

    Thanks

    -Vincent

    On 10/08/2010 03:04 PM, Jeff Zhang wrote:
    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.



    On Fri, Oct 8, 2010 at 6:55 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it
    runs for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can
    see for
    the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time
    Errors
    Counters
    task_201010081314_0010_m_000000

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>

    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>



    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time
    Errors
    Counters
    task_201010081314_0010_r_000000

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    task_201010081314_0010_r_000001

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    task_201010081314_0010_r_000002

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)





    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    task_201010081314_0010_r_000003

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:16:58


    7

    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    task_201010081314_0010_r_000004

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded




    <http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    task_201010081314_0010_r_000005

    <http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>

    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be
    increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:
    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<zjffdu@gmail.com>
    wrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really
    understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    You are right, I didn't change this parameter, therefore the
    default
    is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job.
    Typically set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the
    reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:
    I guess maybe your reduce number is 1 which cause the reduce
    phase
    very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM,
    Vincent<vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done
    quite
    quickly expect 2 for which reduce phase goes really really
    slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or
    one
    big
    Pig
    script? I did one big to avoid to load several time the same
    logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:
    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure
    if I
    post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:
    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM,
    Vincent<vincent.hervieux@gmail.com>
    wrote:
    Hi All,

    I'm quite new to Pig/Hadoop. So maybe my cluster size will
    make
    you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less
    than one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2
    nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it
    won't be
    done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:





    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29


    My question is simple: Is it normal?

    Cheers


    Vincent
    --
    Best Regards

    Jeff Zhang
  • Dmitriy Ryaboy at Oct 8, 2010 at 5:41 pm
    When you changed to using replicated joins, did you put the small relation
    last in your list? The order is important...

    -D
    On Fri, Oct 8, 2010 at 8:47 AM, Vincent wrote:




    *I've tried mapred.child.java.opts value 2048m*. Now the error is a
    timeout. Seems like system is so loaded that it becomes irresponsive...

    Here are the outputs of the job tracker:


    Hadoop job_201010081840_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010>
    failures on prog7 <http://prog7.lan:50030/jobtracker.jsp>

    Attempt Task Machine State Error Logs
    attempt_201010081840_0010_r_000001_0 task_201010081840_0010_r_000001 <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001>
    prog7 <http://prog7:50060> FAILED

    Task attempt_201010081840_0010_r_000001_0 failed to report status for 601
    seconds. Killing!

    Last 4KB <
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097
    Last 8KB <
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193
    All <
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0>



    ask Logs: 'attempt_201010081840_0010_r_000001_0'



    *_stdout logs_*
    ------------------------------------------------------------------------


    *_stderr logs_*
    ------------------------------------------------------------------------


    *_syslog logs_*

    2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=SHUFFLE, sessionId=
    2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask:
    ShuffleRamManager: MemoryLimit=1336252800, MaxSingleShuffleLimit=334063200
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging
    on-disk files
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging
    on-disk files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging in
    memory files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where 0
    is already in progress
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling Map
    Completion Events
    2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs
    2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000002_0, compressed len: 18158866, decompressed
    len: 18158862
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18158862 bytes (18158866 raw bytes) into RAM from
    attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000000_0, compressed len: 20624287, decompressed
    len: 20624283
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    20624283 bytes (20624287 raw bytes) into RAM from
    attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000000_0 -> (105, 265) from prog7
    2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and9
    dup hosts)
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000001_0, compressed len: 18485340, decompressed
    len: 18485336
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18485336 bytes (18485340 raw bytes) into RAM from
    attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000002_0 -> (177, 148) from hermitage
    2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000003_0, compressed len: 18075620, decompressed
    len: 18075616
    2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18075616 bytes (18075620 raw bytes) into RAM from
    attempt_201010081840_0010_m_000003_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000001_0 -> (241, 162) from prog7
    2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0
    2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000003_0 -> (189, 187) from hermitage
    2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000006_0, compressed len: 18255983, decompressed
    len: 18255979
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18255979 bytes (18255983 raw bytes) into RAM from
    attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000004_0, compressed len: 18874529, decompressed
    len: 18874525
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18874525 bytes (18874529 raw bytes) into RAM from
    attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and7
    dup hosts)
    2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000004_0 -> (105, 133) from prog7
    2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000006_0 -> (105, 178) from hermitage
    2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000007_0, compressed len: 18358512, decompressed
    len: 18358508
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18358508 bytes (18358512 raw bytes) into RAM from
    attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000007_0 -> (105, 166) from hermitage
    2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000008_0, compressed len: 18092007, decompressed
    len: 18092003
    2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18092003 bytes (18092007 raw bytes) into RAM from
    attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000008_0 -> (293, 232) from hermitage
    2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000009_0, compressed len: 17941909, decompressed
    len: 17941905
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    17941905 bytes (17941909 raw bytes) into RAM from
    attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs
    2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and6
    dup hosts)
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000009_0 -> (105, 197) from hermitage
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000010_0, compressed len: 18405142, decompressed
    len: 18405138
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18405138 bytes (18405142 raw bytes) into RAM from
    attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000005_0, compressed len: 18009096, decompressed
    len: 18009092
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18009092 bytes (18009096 raw bytes) into RAM from
    attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000005_0 -> (105, 206) from prog7
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000010_0 -> (137, 175) from hermitage
    2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000011_0, compressed len: 20002825, decompressed
    len: 20002821
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    20002821 bytes (20002825 raw bytes) into RAM from
    attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000011_0 -> (105, 143) from hermitage
    2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000012_0, compressed len: 18135959, decompressed
    len: 18135955
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18135955 bytes (18135959 raw bytes) into RAM from
    attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000012_0 -> (137, 149) from hermitage
    2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000013_0, compressed len: 18440786, decompressed
    len: 18440782
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18440782 bytes (18440786 raw bytes) into RAM from
    attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000013_0 -> (137, 142) from hermitage
    2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000014_0, compressed len: 18205885, decompressed
    len: 18205881
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18205881 bytes (18205885 raw bytes) into RAM from
    attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000014_0 -> (253, 159) from hermitage
    2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000015_0, compressed len: 17476262, decompressed
    len: 17476258
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    17476258 bytes (17476262 raw bytes) into RAM from
    attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000015_0 -> (105, 158) from hermitage
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000017_0, compressed len: 18542230, decompressed
    len: 18542226
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18542226 bytes (18542230 raw bytes) into RAM from
    attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000017_0 -> (257, 151) from hermitage
    2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000018_0, compressed len: 18737340, decompressed
    len: 18737336
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18737336 bytes (18737340 raw bytes) into RAM from
    attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000018_0 -> (253, 175) from hermitage
    2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000020_0, compressed len: 17710258, decompressed
    len: 17710254
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    17710254 bytes (17710258 raw bytes) into RAM from
    attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000016_0, compressed len: 20708576, decompressed
    len: 20708572
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    20708572 bytes (20708576 raw bytes) into RAM from
    attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000016_0 -> (297, 318) from prog7
    2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and1
    dup hosts)
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000019_0, compressed len: 18984487, decompressed
    len: 18984483
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18984483 bytes (18984487 raw bytes) into RAM from
    attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000019_0 -> (285, 160) from prog7
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000020_0 -> (105, 127) from hermitage
    2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000021_0, compressed len: 18803713, decompressed
    len: 18803709
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18803709 bytes (18803713 raw bytes) into RAM from
    attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000021_0 -> (137, 164) from hermitage
    2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000022_1, compressed len: 18143868, decompressed
    len: 18143864
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    18143864 bytes (18143868 raw bytes) into RAM from
    attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000022_1 -> (105, 133) from hermitage
    2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000023_1, compressed len: 9198819, decompressed
    len: 9198815
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling
    9198815 bytes (9198819 raw bytes) into RAM from
    attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
    obsolete output of KILLED map-task: 'attempt_201010081840_0010_m_000022_0'
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read
    9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000023_1 -> (137, 166) from hermitage
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
    GetMapEventsThread exiting
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
    getMapsEventsThread joined.
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed
    ram manager
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
    Interleaved on-disk merge complete: 0 files left.
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask: In-memory
    merge complete: 24 files left.
    2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24
    sorted segments
    2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to the
    last merge-pass, with 24 segments left of total size: 436372203 bytes
    2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged 24
    segments, 436372203 bytes to disk to satisfy reduce memory limit
    2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1
    files, 436372161 bytes from disk
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0
    segments, 0 bytes from memory into reduce
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1
    sorted segments
    2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to the
    last merge-pass, with 1 segments left of total size: 436372157 bytes
    2010-10-08 19:13:18,879 INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
    process : 24
    2010-10-08 19:13:18,879 INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input
    paths to process : 24
    2010-10-08 19:16:14,354 INFO
    org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called
    (Collection threshold exceeded) init = 32309248(31552K) used =
    803560952(784727K) committed = 1069678592(1044608K) max =
    1431699456(1398144K)



    On 10/08/2010 06:44 PM, Vincent wrote:

    Yep, I did restart cluster (dfs and mapred stop/start).

    Increasing the amount of memory I can see that the reduce task goes
    further (percentage is greater), but then start to decrease with memory
    failures.
    On 10/08/2010 06:41 PM, Jeff Zhang wrote:

    Did you restart cluster after reconfiguration ?


    On Fri, Oct 8, 2010 at 9:59 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    I've tried with mapred.child.java.opts value:
    -Xmx512m --> still memory errors in reduce phase
    -Xmx1024m --> still memory errors in reduce phase
    I am now trying with -Xmx1536m but I'm afraid that my nodes will start
    to
    swap memory...

    Should I continue in this direction? Or it's already to much and I
    should
    search the problem somewhere else?

    Thanks

    -Vincent

    On 10/08/2010 03:04 PM, Jeff Zhang wrote:

    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.



    On Fri, Oct 8, 2010 at 6:55 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs
    for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see
    for
    the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time Errors
    Counters
    task_201010081314_0010_m_000000

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>

    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>



    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time Errors
    Counters
    task_201010081314_0010_r_000000

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    task_201010081314_0010_r_000001

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space




    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    task_201010081314_0010_r_000002

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)





    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    task_201010081314_0010_r_000003

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:16:58


    7

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    task_201010081314_0010_r_000004

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded




    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    task_201010081314_0010_r_000005

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>

    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be
    increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:

    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<zjffdu@gmail.com>
    wrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really
    understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    You are right, I didn't change this parameter, therefore the
    default
    is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically
    set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the
    reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce phase
    very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<
    vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done
    quite
    quickly expect 2 for which reduce phase goes really really
    slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or one
    big
    Pig
    script? I did one big to avoid to load several time the same logs
    in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if I
    post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM,
    Vincent<vincent.hervieux@gmail.com>
    wrote:

    Hi All,
    I'm quite new to Pig/Hadoop. So maybe my cluster size will
    make
    you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than
    one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2
    nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't be
    done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:






    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent


    --
    Best Regards

    Jeff Zhang
  • Dmitriy Ryaboy at Oct 8, 2010 at 5:47 pm
    Sorry, just saw that you said you explicitly did that.

    Ok, basically what's happening with the GC is that when you do something
    like "group x by id" Pig will try to load *all the tuples of x with a given
    id* into memory in the reducer, unless algebraic or accumulative
    optimizations were applied (this depends on what exactly you are doing with
    results of grouping). Same for joins. A common pitfall is grouping by a
    tuple: group foo by (a, b), and then flattening out the group manually:
    foreach grouped_data generate group.a as a, group.b as b; Instead, you
    should use FLATTEN to get optimizations to kick in: foreach grouped_data
    generate FLATTEN(group) as (a, b);

    Doing things like "group all_http_requests by status_code" is bad, because
    that puts a very large number of records into very few buckets. It's ok if
    all you are doing is counting the results, but bad if you want to do
    something like sort them by timestamp.


    On Fri, Oct 8, 2010 at 10:40 AM, Dmitriy Ryaboy wrote:

    When you changed to using replicated joins, did you put the small relation
    last in your list? The order is important...

    -D

    On Fri, Oct 8, 2010 at 8:47 AM, Vincent wrote:




    *I've tried mapred.child.java.opts value 2048m*. Now the error is a
    timeout. Seems like system is so loaded that it becomes irresponsive...

    Here are the outputs of the job tracker:


    Hadoop job_201010081840_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010>
    failures on prog7 <http://prog7.lan:50030/jobtracker.jsp>

    Attempt Task Machine State Error Logs
    attempt_201010081840_0010_r_000001_0 task_201010081840_0010_r_000001 <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001>
    prog7 <http://prog7:50060> FAILED

    Task attempt_201010081840_0010_r_000001_0 failed to report status for 601
    seconds. Killing!

    Last 4KB <
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097
    Last 8KB <
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193
    All <
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0>



    ask Logs: 'attempt_201010081840_0010_r_000001_0'



    *_stdout logs_*
    ------------------------------------------------------------------------


    *_stderr logs_*
    ------------------------------------------------------------------------


    *_syslog logs_*

    2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=SHUFFLE, sessionId=
    2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask:
    ShuffleRamManager: MemoryLimit=1336252800, MaxSingleShuffleLimit=334063200
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging
    on-disk files
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging
    on-disk files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging in
    memory files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where 0
    is already in progress
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling Map
    Completion Events
    2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs
    2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000002_0, compressed len: 18158866, decompressed
    len: 18158862
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18158862 bytes (18158866 raw bytes) into RAM from
    attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000000_0, compressed len: 20624287, decompressed
    len: 20624283
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20624283 bytes (20624287 raw bytes) into RAM from
    attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000000_0 -> (105, 265) from prog7
    2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and9
    dup hosts)
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000001_0, compressed len: 18485340, decompressed
    len: 18485336
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18485336 bytes (18485340 raw bytes) into RAM from
    attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000002_0 -> (177, 148) from hermitage
    2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000003_0, compressed len: 18075620, decompressed
    len: 18075616
    2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18075616 bytes (18075620 raw bytes) into RAM from
    attempt_201010081840_0010_m_000003_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000001_0 -> (241, 162) from prog7
    2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0
    2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000003_0 -> (189, 187) from hermitage
    2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000006_0, compressed len: 18255983, decompressed
    len: 18255979
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18255979 bytes (18255983 raw bytes) into RAM from
    attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000004_0, compressed len: 18874529, decompressed
    len: 18874525
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18874525 bytes (18874529 raw bytes) into RAM from
    attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and7
    dup hosts)
    2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000004_0 -> (105, 133) from prog7
    2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000006_0 -> (105, 178) from hermitage
    2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000007_0, compressed len: 18358512, decompressed
    len: 18358508
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18358508 bytes (18358512 raw bytes) into RAM from
    attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000007_0 -> (105, 166) from hermitage
    2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000008_0, compressed len: 18092007, decompressed
    len: 18092003
    2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18092003 bytes (18092007 raw bytes) into RAM from
    attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000008_0 -> (293, 232) from hermitage
    2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000009_0, compressed len: 17941909, decompressed
    len: 17941905
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17941905 bytes (17941909 raw bytes) into RAM from
    attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs
    2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and6
    dup hosts)
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000009_0 -> (105, 197) from hermitage
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000010_0, compressed len: 18405142, decompressed
    len: 18405138
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18405138 bytes (18405142 raw bytes) into RAM from
    attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000005_0, compressed len: 18009096, decompressed
    len: 18009092
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18009092 bytes (18009096 raw bytes) into RAM from
    attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000005_0 -> (105, 206) from prog7
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000010_0 -> (137, 175) from hermitage
    2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000011_0, compressed len: 20002825, decompressed
    len: 20002821
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20002821 bytes (20002825 raw bytes) into RAM from
    attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000011_0 -> (105, 143) from hermitage
    2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000012_0, compressed len: 18135959, decompressed
    len: 18135955
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18135955 bytes (18135959 raw bytes) into RAM from
    attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000012_0 -> (137, 149) from hermitage
    2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000013_0, compressed len: 18440786, decompressed
    len: 18440782
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18440782 bytes (18440786 raw bytes) into RAM from
    attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000013_0 -> (137, 142) from hermitage
    2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000014_0, compressed len: 18205885, decompressed
    len: 18205881
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18205881 bytes (18205885 raw bytes) into RAM from
    attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000014_0 -> (253, 159) from hermitage
    2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000015_0, compressed len: 17476262, decompressed
    len: 17476258
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17476258 bytes (17476262 raw bytes) into RAM from
    attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000015_0 -> (105, 158) from hermitage
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000017_0, compressed len: 18542230, decompressed
    len: 18542226
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18542226 bytes (18542230 raw bytes) into RAM from
    attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000017_0 -> (257, 151) from hermitage
    2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000018_0, compressed len: 18737340, decompressed
    len: 18737336
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18737336 bytes (18737340 raw bytes) into RAM from
    attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000018_0 -> (253, 175) from hermitage
    2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000020_0, compressed len: 17710258, decompressed
    len: 17710254
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17710254 bytes (17710258 raw bytes) into RAM from
    attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000016_0, compressed len: 20708576, decompressed
    len: 20708572
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20708572 bytes (20708576 raw bytes) into RAM from
    attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000016_0 -> (297, 318) from prog7
    2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and1
    dup hosts)
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000019_0, compressed len: 18984487, decompressed
    len: 18984483
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18984483 bytes (18984487 raw bytes) into RAM from
    attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000019_0 -> (285, 160) from prog7
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000020_0 -> (105, 127) from hermitage
    2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000021_0, compressed len: 18803713, decompressed
    len: 18803709
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18803709 bytes (18803713 raw bytes) into RAM from
    attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000021_0 -> (137, 164) from hermitage
    2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000022_1, compressed len: 18143868, decompressed
    len: 18143864
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18143864 bytes (18143868 raw bytes) into RAM from
    attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000022_1 -> (105, 133) from hermitage
    2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000023_1, compressed len: 9198819, decompressed
    len: 9198815
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 9198815 bytes (9198819 raw bytes) into RAM from
    attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
    obsolete output of KILLED map-task: 'attempt_201010081840_0010_m_000022_0'
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read
    9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000023_1 -> (137, 166) from hermitage
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
    GetMapEventsThread exiting
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
    getMapsEventsThread joined.
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed
    ram manager
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
    Interleaved on-disk merge complete: 0 files left.
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
    In-memory merge complete: 24 files left.
    2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24
    sorted segments
    2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to the
    last merge-pass, with 24 segments left of total size: 436372203 bytes
    2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged
    24 segments, 436372203 bytes to disk to satisfy reduce memory limit
    2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask: Merging
    1 files, 436372161 bytes from disk
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask: Merging
    0 segments, 0 bytes from memory into reduce
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1
    sorted segments
    2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to the
    last merge-pass, with 1 segments left of total size: 436372157 bytes
    2010-10-08 19:13:18,879 INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
    process : 24
    2010-10-08 19:13:18,879 INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input
    paths to process : 24
    2010-10-08 19:16:14,354 INFO
    org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called
    (Collection threshold exceeded) init = 32309248(31552K) used =
    803560952(784727K) committed = 1069678592(1044608K) max =
    1431699456(1398144K)



    On 10/08/2010 06:44 PM, Vincent wrote:

    Yep, I did restart cluster (dfs and mapred stop/start).

    Increasing the amount of memory I can see that the reduce task goes
    further (percentage is greater), but then start to decrease with memory
    failures.
    On 10/08/2010 06:41 PM, Jeff Zhang wrote:

    Did you restart cluster after reconfiguration ?


    On Fri, Oct 8, 2010 at 9:59 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    I've tried with mapred.child.java.opts value:
    -Xmx512m --> still memory errors in reduce phase
    -Xmx1024m --> still memory errors in reduce phase
    I am now trying with -Xmx1536m but I'm afraid that my nodes will start
    to
    swap memory...

    Should I continue in this direction? Or it's already to much and I
    should
    search the problem somewhere else?

    Thanks

    -Vincent

    On 10/08/2010 03:04 PM, Jeff Zhang wrote:

    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.



    On Fri, Oct 8, 2010 at 6:55 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs
    for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see
    for
    the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time
    Errors
    Counters
    task_201010081314_0010_m_000000

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>

    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>



    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time
    Errors
    Counters
    task_201010081314_0010_r_000000

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    task_201010081314_0010_r_000001

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space




    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    task_201010081314_0010_r_000002

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)





    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    task_201010081314_0010_r_000003

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:16:58


    7

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    task_201010081314_0010_r_000004

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded




    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    task_201010081314_0010_r_000005

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>

    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be
    increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:

    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<zjffdu@gmail.com>
    wrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really
    understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com
    wrote:
    You are right, I didn't change this parameter, therefore the
    default
    is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically
    set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the
    reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce
    phase
    very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<
    vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done
    quite
    quickly expect 2 for which reduce phase goes really really
    slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or
    one
    big
    Pig
    script? I did one big to avoid to load several time the same
    logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if
    I
    post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM,
    Vincent<vincent.hervieux@gmail.com>
    wrote:

    Hi All,
    I'm quite new to Pig/Hadoop. So maybe my cluster size will
    make
    you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than
    one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2
    nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't
    be
    done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:






    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent


    --
    Best Regards

    Jeff Zhang
  • Vincent at Oct 8, 2010 at 6:26 pm
    Oups! I suppose you've pointed out my mistakes.
    I am doing a lot a bad operations like you've mentioned :

    - group foo by (a, b), and then flattening out the group manually:
    foreach grouped_data generate group.a as a, group.b as b;
    - group all_http_requests by status_code.

    One more question and I will leave you for the week end: :-)

    Running the script locally with Pig, I am loading many small (< 3MB) log
    files as inputs (288 logs for each input precisely).
    Running the script on hadoop, before copying the logs on the HDFS, I am
    concatenating all them by input types. As I've read that HDFS doesn't
    like small files.
    It makes an input is more than 1GB and couple of others are several
    hundreds of MB.

    Is it ok? I've let the HDFS block size by default (64MB).

    And many thanks for your help to both of you Jeff and Dmitriy! I've
    learned a lot today thanks to you.

    -Vincent
    On 10/08/2010 09:47 PM, Dmitriy Ryaboy wrote:
    Sorry, just saw that you said you explicitly did that.

    Ok, basically what's happening with the GC is that when you do something
    like "group x by id" Pig will try to load *all the tuples of x with a given
    id* into memory in the reducer, unless algebraic or accumulative
    optimizations were applied (this depends on what exactly you are doing with
    results of grouping). Same for joins. A common pitfall is grouping by a
    tuple: group foo by (a, b), and then flattening out the group manually:
    foreach grouped_data generate group.a as a, group.b as b; Instead, you
    should use FLATTEN to get optimizations to kick in: foreach grouped_data
    generate FLATTEN(group) as (a, b);

    Doing things like "group all_http_requests by status_code" is bad, because
    that puts a very large number of records into very few buckets. It's ok if
    all you are doing is counting the results, but bad if you want to do
    something like sort them by timestamp.



    On Fri, Oct 8, 2010 at 10:40 AM, Dmitriy Ryaboywrote:
    When you changed to using replicated joins, did you put the small relation
    last in your list? The order is important...

    -D


    On Fri, Oct 8, 2010 at 8:47 AM, Vincentwrote:

    *I've tried mapred.child.java.opts value 2048m*. Now the error is a
    timeout. Seems like system is so loaded that it becomes irresponsive...

    Here are the outputs of the job tracker:


    Hadoop job_201010081840_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010>
    failures on prog7<http://prog7.lan:50030/jobtracker.jsp>

    Attempt Task Machine State Error Logs
    attempt_201010081840_0010_r_000001_0 task_201010081840_0010_r_000001<
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001>
    prog7<http://prog7:50060> FAILED

    Task attempt_201010081840_0010_r_000001_0 failed to report status for 601
    seconds. Killing!

    Last 4KB<
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097
    Last 8KB<
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193
    All<
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0>



    ask Logs: 'attempt_201010081840_0010_r_000001_0'



    *_stdout logs_*
    ------------------------------------------------------------------------


    *_stderr logs_*
    ------------------------------------------------------------------------


    *_syslog logs_*

    2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=SHUFFLE, sessionId=
    2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask:
    ShuffleRamManager: MemoryLimit=1336252800, MaxSingleShuffleLimit=334063200
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging
    on-disk files
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging
    on-disk files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging in
    memory files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where 0
    is already in progress
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling Map
    Completion Events
    2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs
    2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000002_0, compressed len: 18158866, decompressed
    len: 18158862
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18158862 bytes (18158866 raw bytes) into RAM from
    attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000000_0, compressed len: 20624287, decompressed
    len: 20624283
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20624283 bytes (20624287 raw bytes) into RAM from
    attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000000_0 -> (105, 265) from prog7
    2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and9
    dup hosts)
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000001_0, compressed len: 18485340, decompressed
    len: 18485336
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18485336 bytes (18485340 raw bytes) into RAM from
    attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000002_0 -> (177, 148) from hermitage
    2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000003_0, compressed len: 18075620, decompressed
    len: 18075616
    2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18075616 bytes (18075620 raw bytes) into RAM from
    attempt_201010081840_0010_m_000003_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000001_0 -> (241, 162) from prog7
    2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0
    2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000003_0 -> (189, 187) from hermitage
    2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000006_0, compressed len: 18255983, decompressed
    len: 18255979
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18255979 bytes (18255983 raw bytes) into RAM from
    attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000004_0, compressed len: 18874529, decompressed
    len: 18874525
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18874525 bytes (18874529 raw bytes) into RAM from
    attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and7
    dup hosts)
    2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000004_0 -> (105, 133) from prog7
    2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000006_0 -> (105, 178) from hermitage
    2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000007_0, compressed len: 18358512, decompressed
    len: 18358508
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18358508 bytes (18358512 raw bytes) into RAM from
    attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000007_0 -> (105, 166) from hermitage
    2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000008_0, compressed len: 18092007, decompressed
    len: 18092003
    2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18092003 bytes (18092007 raw bytes) into RAM from
    attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000008_0 -> (293, 232) from hermitage
    2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000009_0, compressed len: 17941909, decompressed
    len: 17941905
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17941905 bytes (17941909 raw bytes) into RAM from
    attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs
    2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and6
    dup hosts)
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000009_0 -> (105, 197) from hermitage
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000010_0, compressed len: 18405142, decompressed
    len: 18405138
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18405138 bytes (18405142 raw bytes) into RAM from
    attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000005_0, compressed len: 18009096, decompressed
    len: 18009092
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18009092 bytes (18009096 raw bytes) into RAM from
    attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000005_0 -> (105, 206) from prog7
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000010_0 -> (137, 175) from hermitage
    2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000011_0, compressed len: 20002825, decompressed
    len: 20002821
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20002821 bytes (20002825 raw bytes) into RAM from
    attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000011_0 -> (105, 143) from hermitage
    2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000012_0, compressed len: 18135959, decompressed
    len: 18135955
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18135955 bytes (18135959 raw bytes) into RAM from
    attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000012_0 -> (137, 149) from hermitage
    2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000013_0, compressed len: 18440786, decompressed
    len: 18440782
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18440782 bytes (18440786 raw bytes) into RAM from
    attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000013_0 -> (137, 142) from hermitage
    2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000014_0, compressed len: 18205885, decompressed
    len: 18205881
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18205881 bytes (18205885 raw bytes) into RAM from
    attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000014_0 -> (253, 159) from hermitage
    2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000015_0, compressed len: 17476262, decompressed
    len: 17476258
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17476258 bytes (17476262 raw bytes) into RAM from
    attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000015_0 -> (105, 158) from hermitage
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000017_0, compressed len: 18542230, decompressed
    len: 18542226
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18542226 bytes (18542230 raw bytes) into RAM from
    attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000017_0 -> (257, 151) from hermitage
    2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000018_0, compressed len: 18737340, decompressed
    len: 18737336
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18737336 bytes (18737340 raw bytes) into RAM from
    attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000018_0 -> (253, 175) from hermitage
    2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000020_0, compressed len: 17710258, decompressed
    len: 17710254
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17710254 bytes (17710258 raw bytes) into RAM from
    attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000016_0, compressed len: 20708576, decompressed
    len: 20708572
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20708572 bytes (20708576 raw bytes) into RAM from
    attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000016_0 -> (297, 318) from prog7
    2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and1
    dup hosts)
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000019_0, compressed len: 18984487, decompressed
    len: 18984483
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18984483 bytes (18984487 raw bytes) into RAM from
    attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000019_0 -> (285, 160) from prog7
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000020_0 -> (105, 127) from hermitage
    2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000021_0, compressed len: 18803713, decompressed
    len: 18803709
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18803709 bytes (18803713 raw bytes) into RAM from
    attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000021_0 -> (137, 164) from hermitage
    2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000022_1, compressed len: 18143868, decompressed
    len: 18143864
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18143864 bytes (18143868 raw bytes) into RAM from
    attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000022_1 -> (105, 133) from hermitage
    2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts and0
    dup hosts)
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask: header:
    attempt_201010081840_0010_m_000023_1, compressed len: 9198819, decompressed
    len: 9198815
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 9198815 bytes (9198819 raw bytes) into RAM from
    attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring
    obsolete output of KILLED map-task: 'attempt_201010081840_0010_m_000022_0'
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read
    9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000023_1 -> (137, 166) from hermitage
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
    GetMapEventsThread exiting
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
    getMapsEventsThread joined.
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed
    ram manager
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
    Interleaved on-disk merge complete: 0 files left.
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
    In-memory merge complete: 24 files left.
    2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24
    sorted segments
    2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to the
    last merge-pass, with 24 segments left of total size: 436372203 bytes
    2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged
    24 segments, 436372203 bytes to disk to satisfy reduce memory limit
    2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask: Merging
    1 files, 436372161 bytes from disk
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask: Merging
    0 segments, 0 bytes from memory into reduce
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1
    sorted segments
    2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to the
    last merge-pass, with 1 segments left of total size: 436372157 bytes
    2010-10-08 19:13:18,879 INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
    process : 24
    2010-10-08 19:13:18,879 INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input
    paths to process : 24
    2010-10-08 19:16:14,354 INFO
    org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called
    (Collection threshold exceeded) init = 32309248(31552K) used =
    803560952(784727K) committed = 1069678592(1044608K) max =
    1431699456(1398144K)



    On 10/08/2010 06:44 PM, Vincent wrote:

    Yep, I did restart cluster (dfs and mapred stop/start).

    Increasing the amount of memory I can see that the reduce task goes
    further (percentage is greater), but then start to decrease with memory
    failures.
    On 10/08/2010 06:41 PM, Jeff Zhang wrote:

    Did you restart cluster after reconfiguration ?


    On Fri, Oct 8, 2010 at 9:59 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    I've tried with mapred.child.java.opts value:
    -Xmx512m --> still memory errors in reduce phase
    -Xmx1024m --> still memory errors in reduce phase
    I am now trying with -Xmx1536m but I'm afraid that my nodes will start
    to
    swap memory...

    Should I continue in this direction? Or it's already to much and I
    should
    search the problem somewhere else?

    Thanks

    -Vincent

    On 10/08/2010 03:04 PM, Jeff Zhang wrote:

    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.



    On Fri, Oct 8, 2010 at 6:55 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:
    Thanks to Dmitriy and Jeff, I've set :

    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it runs
    for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can see
    for
    the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time
    Errors
    Counters
    task_201010081314_0010_m_000000

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>

    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000>



    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010>
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>

    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time
    Errors
    Counters
    task_201010081314_0010_r_000000

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000>

    task_201010081314_0010_r_000001

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space




    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001>

    task_201010081314_0010_r_000002

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)





    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002>

    task_201010081314_0010_r_000003

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:16:58


    7

    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003>

    task_201010081314_0010_r_000004

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded




    <
    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004>

    task_201010081314_0010_r_000005

    <
    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005>

    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be
    increased?

    -Vincent

    On 10/08/2010 01:12 PM, Jeff Zhang wrote:

    BTW, you can look at the job tracker web ui to see which part of the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<zjffdu@gmail.com>
    wrote:
    No I mean whether your mapreduce job's reduce task number is 1.

    And could you share your pig script, then others can really
    understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<vincent.hervieux@gmail.com
    wrote:
    You are right, I didn't change this parameter, therefore the
    default
    is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job. Typically
    set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the
    reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)
    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce
    phase
    very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<
    vincent.hervieux@gmail.com>
    wrote:
    Well I can see from the job tracker that all the jobs are done
    quite
    quickly expect 2 for which reduce phase goes really really
    slowly.

    But how can I make the parallel between a job in the Hadoop jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or
    one
    big
    Pig
    script? I did one big to avoid to load several time the same
    logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent

    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.

    For the script, well it's more then 500 lines, I'm not sure if
    I
    post
    it
    here that somebody will read it till the end :-)

    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?

    On Thu, Oct 7, 2010 at 11:48 PM,
    Vincent<vincent.hervieux@gmail.com>
    wrote:

    Hi All,
    I'm quite new to Pig/Hadoop. So maybe my cluster size will
    make
    you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than
    one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These 2
    nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't
    be
    done
    in
    half
    an hour (time in local divided by number of nodes). But I was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:






    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent


    --
    Best Regards

    Jeff Zhang
  • Dmitriy Ryaboy at Oct 8, 2010 at 6:31 pm
    Yeah concatenating into a big file is a good idea (as long as your file
    format is splittable, which new-line delimited text files trivially are, but
    compression sometimes throws a kink in that).

    Oh, also, use compression when you are in production. We like LZO at
    Twitter, others use other formats sometimes. :).

    -D
    On Fri, Oct 8, 2010 at 11:21 AM, Vincent wrote:

    Oups! I suppose you've pointed out my mistakes.
    I am doing a lot a bad operations like you've mentioned :


    - group foo by (a, b), and then flattening out the group manually: foreach
    grouped_data generate group.a as a, group.b as b;
    - group all_http_requests by status_code.

    One more question and I will leave you for the week end: :-)

    Running the script locally with Pig, I am loading many small (< 3MB) log
    files as inputs (288 logs for each input precisely).
    Running the script on hadoop, before copying the logs on the HDFS, I am
    concatenating all them by input types. As I've read that HDFS doesn't like
    small files.
    It makes an input is more than 1GB and couple of others are several
    hundreds of MB.

    Is it ok? I've let the HDFS block size by default (64MB).

    And many thanks for your help to both of you Jeff and Dmitriy! I've learned
    a lot today thanks to you.

    -Vincent

    On 10/08/2010 09:47 PM, Dmitriy Ryaboy wrote:

    Sorry, just saw that you said you explicitly did that.

    Ok, basically what's happening with the GC is that when you do something
    like "group x by id" Pig will try to load *all the tuples of x with a
    given
    id* into memory in the reducer, unless algebraic or accumulative
    optimizations were applied (this depends on what exactly you are doing
    with
    results of grouping). Same for joins. A common pitfall is grouping by a
    tuple: group foo by (a, b), and then flattening out the group manually:
    foreach grouped_data generate group.a as a, group.b as b; Instead, you
    should use FLATTEN to get optimizations to kick in: foreach grouped_data
    generate FLATTEN(group) as (a, b);

    Doing things like "group all_http_requests by status_code" is bad, because
    that puts a very large number of records into very few buckets. It's ok if
    all you are doing is counting the results, but bad if you want to do
    something like sort them by timestamp.



    On Fri, Oct 8, 2010 at 10:40 AM, Dmitriy Ryaboy<dvryaboy@gmail.com>
    wrote:

    When you changed to using replicated joins, did you put the small
    relation
    last in your list? The order is important...

    -D


    On Fri, Oct 8, 2010 at 8:47 AM, Vincent<vincent.hervieux@gmail.com
    wrote:
    *I've tried mapred.child.java.opts value 2048m*. Now the error is a
    timeout. Seems like system is so loaded that it becomes irresponsive...

    Here are the outputs of the job tracker:


    Hadoop job_201010081840_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081840_0010>
    failures on prog7<http://prog7.lan:50030/jobtracker.jsp>

    Attempt Task Machine State Error Logs
    attempt_201010081840_0010_r_000001_0 task_201010081840_0010_r_000001<

    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081840_0010&tipid=task_201010081840_0010_r_000001
    prog7<http://prog7:50060> FAILED

    Task attempt_201010081840_0010_r_000001_0 failed to report status for
    601
    seconds. Killing!

    Last 4KB<

    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-4097
    Last 8KB<

    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0&start=-8193
    All<
    http://prog7:50060/tasklog?taskid=attempt_201010081840_0010_r_000001_0>



    ask Logs: 'attempt_201010081840_0010_r_000001_0'



    *_stdout logs_*
    ------------------------------------------------------------------------


    *_stderr logs_*
    ------------------------------------------------------------------------


    *_syslog logs_*

    2010-10-08 19:11:49,732 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
    Initializing JVM Metrics with processName=SHUFFLE, sessionId=
    2010-10-08 19:11:50,963 INFO org.apache.hadoop.mapred.ReduceTask:
    ShuffleRamManager: MemoryLimit=1336252800,
    MaxSingleShuffleLimit=334063200
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging
    on-disk files
    2010-10-08 19:11:50,997 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread waiting: Thread for merging
    on-disk files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for merging
    in
    memory files
    2010-10-08 19:11:51,004 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Need another 24 map output(s) where

    is already in progress
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 0 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:11:51,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Thread started: Thread for polling
    Map
    Completion Events
    2010-10-08 19:11:51,020 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 10 new map-outputs
    2010-10-08 19:11:56,005 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000002_0, compressed len: 18158866,
    decompressed
    len: 18158862
    2010-10-08 19:11:56,091 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18158862 bytes (18158866 raw bytes) into RAM from
    attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000000_0, compressed len: 20624287,
    decompressed
    len: 20624283
    2010-10-08 19:11:56,582 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20624283 bytes (20624287 raw bytes) into RAM from
    attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,035 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:11:57,258 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20624283 bytes from map-output for attempt_201010081840_0010_m_000000_0
    2010-10-08 19:11:57,271 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000000_0 -> (105, 265) from prog7
    2010-10-08 19:11:57,274 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and9
    dup hosts)
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000001_0, compressed len: 18485340,
    decompressed
    len: 18485336
    2010-10-08 19:11:57,313 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18485336 bytes (18485340 raw bytes) into RAM from
    attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18158862 bytes from map-output for attempt_201010081840_0010_m_000002_0
    2010-10-08 19:11:57,971 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000002_0 -> (177, 148) from hermitage
    2010-10-08 19:11:57,980 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:11:58,043 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000003_0, compressed len: 18075620,
    decompressed
    len: 18075616
    2010-10-08 19:11:58,044 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18075616 bytes (18075620 raw bytes) into RAM from
    attempt_201010081840_0010_m_000003_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18485336 bytes from map-output for attempt_201010081840_0010_m_000001_0
    2010-10-08 19:11:58,277 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000001_0 -> (241, 162) from prog7
    2010-10-08 19:12:01,929 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18075616 bytes from map-output for attempt_201010081840_0010_m_000003_0
    2010-10-08 19:12:01,930 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000003_0 -> (189, 187) from hermitage
    2010-10-08 19:12:01,935 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000006_0, compressed len: 18255983,
    decompressed
    len: 18255979
    2010-10-08 19:12:01,937 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18255979 bytes (18255983 raw bytes) into RAM from
    attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:03,044 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000004_0, compressed len: 18874529,
    decompressed
    len: 18874525
    2010-10-08 19:12:03,049 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18874525 bytes (18874529 raw bytes) into RAM from
    attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,067 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and7
    dup hosts)
    2010-10-08 19:12:03,608 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18874525 bytes from map-output for attempt_201010081840_0010_m_000004_0
    2010-10-08 19:12:03,609 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000004_0 -> (105, 133) from prog7
    2010-10-08 19:12:04,087 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18255979 bytes from map-output for attempt_201010081840_0010_m_000006_0
    2010-10-08 19:12:04,088 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000006_0 -> (105, 178) from hermitage
    2010-10-08 19:12:04,094 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000007_0, compressed len: 18358512,
    decompressed
    len: 18358508
    2010-10-08 19:12:04,319 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18358508 bytes (18358512 raw bytes) into RAM from
    attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,254 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18358508 bytes from map-output for attempt_201010081840_0010_m_000007_0
    2010-10-08 19:12:06,255 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000007_0 -> (105, 166) from hermitage
    2010-10-08 19:12:06,258 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:06,270 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000008_0, compressed len: 18092007,
    decompressed
    len: 18092003
    2010-10-08 19:12:06,271 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18092003 bytes (18092007 raw bytes) into RAM from
    attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,808 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18092003 bytes from map-output for attempt_201010081840_0010_m_000008_0
    2010-10-08 19:12:07,809 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000008_0 -> (293, 232) from hermitage
    2010-10-08 19:12:07,810 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000009_0, compressed len: 17941909,
    decompressed
    len: 17941905
    2010-10-08 19:12:07,813 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17941905 bytes (17941909 raw bytes) into RAM from
    attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,059 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 3 new map-outputs
    2010-10-08 19:12:09,060 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and6
    dup hosts)
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17941905 bytes from map-output for attempt_201010081840_0010_m_000009_0
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000009_0 -> (105, 197) from hermitage
    2010-10-08 19:12:09,338 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000010_0, compressed len: 18405142,
    decompressed
    len: 18405138
    2010-10-08 19:12:09,341 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18405138 bytes (18405142 raw bytes) into RAM from
    attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000005_0, compressed len: 18009096,
    decompressed
    len: 18009092
    2010-10-08 19:12:09,369 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18009092 bytes (18009096 raw bytes) into RAM from
    attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18009092 bytes from map-output for attempt_201010081840_0010_m_000005_0
    2010-10-08 19:12:10,691 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000005_0 -> (105, 206) from prog7
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18405138 bytes from map-output for attempt_201010081840_0010_m_000010_0
    2010-10-08 19:12:11,101 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000010_0 -> (137, 175) from hermitage
    2010-10-08 19:12:11,102 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000011_0, compressed len: 20002825,
    decompressed
    len: 20002821
    2010-10-08 19:12:11,104 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20002821 bytes (20002825 raw bytes) into RAM from
    attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20002821 bytes from map-output for attempt_201010081840_0010_m_000011_0
    2010-10-08 19:12:12,805 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000011_0 -> (105, 143) from hermitage
    2010-10-08 19:12:12,815 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000012_0, compressed len: 18135959,
    decompressed
    len: 18135955
    2010-10-08 19:12:12,817 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18135955 bytes (18135959 raw bytes) into RAM from
    attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18135955 bytes from map-output for attempt_201010081840_0010_m_000012_0
    2010-10-08 19:12:14,361 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000012_0 -> (137, 149) from hermitage
    2010-10-08 19:12:14,362 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000013_0, compressed len: 18440786,
    decompressed
    len: 18440782
    2010-10-08 19:12:14,364 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18440782 bytes (18440786 raw bytes) into RAM from
    attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18440782 bytes from map-output for attempt_201010081840_0010_m_000013_0
    2010-10-08 19:12:15,935 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000013_0 -> (137, 142) from hermitage
    2010-10-08 19:12:15,936 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000014_0, compressed len: 18205885,
    decompressed
    len: 18205881
    2010-10-08 19:12:15,938 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18205881 bytes (18205885 raw bytes) into RAM from
    attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,489 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18205881 bytes from map-output for attempt_201010081840_0010_m_000014_0
    2010-10-08 19:12:17,499 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000014_0 -> (253, 159) from hermitage
    2010-10-08 19:12:17,506 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000015_0, compressed len: 17476262,
    decompressed
    len: 17476258
    2010-10-08 19:12:17,510 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17476258 bytes (17476262 raw bytes) into RAM from
    attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:17,612 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:19,030 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17476258 bytes from map-output for attempt_201010081840_0010_m_000015_0
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000015_0 -> (105, 158) from hermitage
    2010-10-08 19:12:19,035 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000017_0, compressed len: 18542230,
    decompressed
    len: 18542226
    2010-10-08 19:12:19,061 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18542226 bytes (18542230 raw bytes) into RAM from
    attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18542226 bytes from map-output for attempt_201010081840_0010_m_000017_0
    2010-10-08 19:12:20,640 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000017_0 -> (257, 151) from hermitage
    2010-10-08 19:12:23,626 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 1 new map-outputs
    2010-10-08 19:12:25,643 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000018_0, compressed len: 18737340,
    decompressed
    len: 18737336
    2010-10-08 19:12:25,670 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18737336 bytes (18737340 raw bytes) into RAM from
    attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,438 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18737336 bytes from map-output for attempt_201010081840_0010_m_000018_0
    2010-10-08 19:12:27,439 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000018_0 -> (253, 175) from hermitage
    2010-10-08 19:12:28,646 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:31,652 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:32,439 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 2 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000020_0, compressed len: 17710258,
    decompressed
    len: 17710254
    2010-10-08 19:12:32,473 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 17710254 bytes (17710258 raw bytes) into RAM from
    attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000016_0, compressed len: 20708576,
    decompressed
    len: 20708572
    2010-10-08 19:12:32,475 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 20708572 bytes (20708576 raw bytes) into RAM from
    attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,138 INFO org.apache.hadoop.mapred.ReduceTask: Read
    20708572 bytes from map-output for attempt_201010081840_0010_m_000016_0
    2010-10-08 19:12:33,164 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000016_0 -> (297, 318) from prog7
    2010-10-08 19:12:33,167 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and1
    dup hosts)
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000019_0, compressed len: 18984487,
    decompressed
    len: 18984483
    2010-10-08 19:12:33,172 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18984483 bytes (18984487 raw bytes) into RAM from
    attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18984483 bytes from map-output for attempt_201010081840_0010_m_000019_0
    2010-10-08 19:12:33,774 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000019_0 -> (285, 160) from prog7
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Read
    17710254 bytes from map-output for attempt_201010081840_0010_m_000020_0
    2010-10-08 19:12:34,057 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000020_0 -> (105, 127) from hermitage
    2010-10-08 19:12:34,081 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000021_0, compressed len: 18803713,
    decompressed
    len: 18803709
    2010-10-08 19:12:34,085 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18803709 bytes (18803713 raw bytes) into RAM from
    attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18803709 bytes from map-output for attempt_201010081840_0010_m_000021_0
    2010-10-08 19:12:36,579 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000021_0 -> (137, 164) from hermitage
    2010-10-08 19:12:43,867 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0: Got 2 new map-outputs
    2010-10-08 19:12:46,585 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000022_1, compressed len: 18143868,
    decompressed
    len: 18143864
    2010-10-08 19:12:46,589 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 18143864 bytes (18143868 raw bytes) into RAM from
    attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,167 INFO org.apache.hadoop.mapred.ReduceTask: Read
    18143864 bytes from map-output for attempt_201010081840_0010_m_000022_1
    2010-10-08 19:12:48,176 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000022_1 -> (105, 133) from hermitage
    2010-10-08 19:12:48,182 INFO org.apache.hadoop.mapred.ReduceTask:
    attempt_201010081840_0010_r_000001_0 Scheduled 1 outputs (0 slow hosts
    and0
    dup hosts)
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask:
    header:
    attempt_201010081840_0010_m_000023_1, compressed len: 9198819,
    decompressed
    len: 9198815
    2010-10-08 19:12:48,428 INFO org.apache.hadoop.mapred.ReduceTask:
    Shuffling 9198815 bytes (9198819 raw bytes) into RAM from
    attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,878 INFO org.apache.hadoop.mapred.ReduceTask:
    Ignoring
    obsolete output of KILLED map-task:
    'attempt_201010081840_0010_m_000022_0'
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Read
    9198815 bytes from map-output for attempt_201010081840_0010_m_000023_1
    2010-10-08 19:12:49,938 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1
    from attempt_201010081840_0010_m_000023_1 -> (137, 166) from hermitage
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
    GetMapEventsThread exiting
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask:
    getMapsEventsThread joined.
    2010-10-08 19:12:50,878 INFO org.apache.hadoop.mapred.ReduceTask: Closed
    ram manager
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
    Interleaved on-disk merge complete: 0 files left.
    2010-10-08 19:12:50,879 INFO org.apache.hadoop.mapred.ReduceTask:
    In-memory merge complete: 24 files left.
    2010-10-08 19:12:51,029 INFO org.apache.hadoop.mapred.Merger: Merging 24
    sorted segments
    2010-10-08 19:12:51,030 INFO org.apache.hadoop.mapred.Merger: Down to
    the
    last merge-pass, with 24 segments left of total size: 436372203 bytes
    2010-10-08 19:13:04,406 INFO org.apache.hadoop.mapred.ReduceTask: Merged
    24 segments, 436372203 bytes to disk to satisfy reduce memory limit
    2010-10-08 19:13:04,407 INFO org.apache.hadoop.mapred.ReduceTask:
    Merging
    1 files, 436372161 bytes from disk
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.ReduceTask:
    Merging
    0 segments, 0 bytes from memory into reduce
    2010-10-08 19:13:04,426 INFO org.apache.hadoop.mapred.Merger: Merging 1
    sorted segments
    2010-10-08 19:13:04,463 INFO org.apache.hadoop.mapred.Merger: Down to
    the
    last merge-pass, with 1 segments left of total size: 436372157 bytes
    2010-10-08 19:13:18,879 INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths
    to
    process : 24
    2010-10-08 19:13:18,879 INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total
    input
    paths to process : 24
    2010-10-08 19:16:14,354 INFO
    org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
    called
    (Collection threshold exceeded) init = 32309248(31552K) used =
    803560952(784727K) committed = 1069678592(1044608K) max =
    1431699456(1398144K)




    On 10/08/2010 06:44 PM, Vincent wrote:

    Yep, I did restart cluster (dfs and mapred stop/start).
    Increasing the amount of memory I can see that the reduce task goes
    further (percentage is greater), but then start to decrease with memory
    failures.

    On 10/08/2010 06:41 PM, Jeff Zhang wrote:

    Did you restart cluster after reconfiguration ?

    On Fri, Oct 8, 2010 at 9:59 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:

    I've tried with mapred.child.java.opts value:
    -Xmx512m --> still memory errors in reduce phase
    -Xmx1024m --> still memory errors in reduce phase
    I am now trying with -Xmx1536m but I'm afraid that my nodes will
    start
    to
    swap memory...

    Should I continue in this direction? Or it's already to much and I
    should
    search the problem somewhere else?

    Thanks

    -Vincent


    On 10/08/2010 03:04 PM, Jeff Zhang wrote:

    Try to increase the heap size on of task by setting
    mapred.child.java.opts in mapred-site.xml. The default value is
    -Xmx200m in mapred-default.xml which may be too small for you.



    On Fri, Oct 8, 2010 at 6:55 PM, Vincent<vincent.hervieux@gmail.com>
    wrote:

    Thanks to Dmitriy and Jeff, I've set :
    set default_parallel 20; at the beginning of my script.

    Updated 8 JOINs to behave like:

    JOIN big BY id, small BY id USING 'replicated';

    Unfortunately this didn't improve the script speed (at least it
    runs
    for
    more than one hour now).

    But Looking in the jobtracker one of the job which reduce, I can
    see
    for
    the
    map:


    Hadoop map task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>


    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time
    Errors
    Counters
    task_201010081314_0010_m_000000

    <

    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000
    100.00%


    8-Oct-2010 14:07:44
    8-Oct-2010 14:23:11 (15mins, 27sec)


    Too many fetch-failures
    Too many fetch-failures


    8

    <

    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_m_000000


    And I can see this for the reduce


    Hadoop reduce task list for job_201010081314_0010
    <http://prog7.lan:50030/jobdetails.jsp?jobid=job_201010081314_0010
    on
    prog7<http://prog7.lan:50030/jobtracker.jsp>


    ------------------------------------------------------------------------



    All Tasks

    Task Complete Status Start Time Finish Time
    Errors
    Counters
    task_201010081314_0010_r_000000

    <

    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000
    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:14:49



    Error: GC overhead limit exceeded


    7

    <

    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000000
    task_201010081314_0010_r_000001

    <

    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001
    0.00%


    8-Oct-2010 14:14:52



    Error: Java heap space




    <

    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000001
    task_201010081314_0010_r_000002

    <

    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002
    0.00%


    8-Oct-2010 14:15:58



    java.io.IOException: Task process exit with nonzero status of 1.
    at
    org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)





    <

    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000002
    task_201010081314_0010_r_000003

    <

    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003
    9.72%



    reduce> copy (7 of 24 at 0.01 MB/s)>
    8-Oct-2010 14:16:58


    7

    <

    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000003
    task_201010081314_0010_r_000004

    <

    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004
    0.00%


    8-Oct-2010 14:18:11



    Error: GC overhead limit exceeded




    <

    http://prog7.lan:50030/taskstats.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000004
    task_201010081314_0010_r_000005

    <

    http://prog7.lan:50030/taskdetails.jsp?jobid=job_201010081314_0010&tipid=task_201010081314_0010_r_000005
    0.00%


    8-Oct-2010 14:18:56



    Error: GC overhead limit exceeded






    Seems like it runs out of memory... Which parameter should be
    increased?

    -Vincent


    On 10/08/2010 01:12 PM, Jeff Zhang wrote:

    BTW, you can look at the job tracker web ui to see which part of
    the
    job cost the most of the time



    On Fri, Oct 8, 2010 at 5:11 PM, Jeff Zhang<zjffdu@gmail.com>
    wrote:

    No I mean whether your mapreduce job's reduce task number is 1.
    And could you share your pig script, then others can really
    understand
    your problem.



    On Fri, Oct 8, 2010 at 5:04 PM, Vincent<
    vincent.hervieux@gmail.com
    wrote:

    You are right, I didn't change this parameter, therefore the
    default
    is
    used from src/mapred/mapred-default.xml

    <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
    <description>The default number of reduce tasks per job.
    Typically
    set
    to
    99%
    of the cluster's reduce capacity, so that if a node fails the
    reduces
    can
    still be executed in a single wave.
    Ignored when mapred.job.tracker is "local".
    </description>
    </property>

    Not clear for me what is the reduce capacity of my cluster :)

    On 10/08/2010 01:00 PM, Jeff Zhang wrote:

    I guess maybe your reduce number is 1 which cause the reduce
    phase
    very
    slowly.



    On Fri, Oct 8, 2010 at 4:44 PM, Vincent<
    vincent.hervieux@gmail.com>
    wrote:

    Well I can see from the job tracker that all the jobs are
    done
    quite
    quickly expect 2 for which reduce phase goes really really
    slowly.

    But how can I make the parallel between a job in the Hadoop
    jop
    tracker
    (example: job_201010072150_0045) and the Pig script execution?

    And what is the most efficient: several small Pig scripts? or
    one
    big
    Pig
    script? I did one big to avoid to load several time the same
    logs in
    different scripts. Maybe it is not so good design...

    Thanks for your help.

    - Vincent


    On 10/08/2010 11:31 AM, Vincent wrote:

    I'm using pig-0.7.0 on hadoop-0.20.2.
    For the script, well it's more then 500 lines, I'm not sure
    if
    I
    post
    it
    here that somebody will read it till the end :-)


    On 10/08/2010 11:26 AM, Dmitriy Ryaboy wrote:

    What version of Pig, and what does your script look like?
    On Thu, Oct 7, 2010 at 11:48 PM,
    Vincent<vincent.hervieux@gmail.com>
    wrote:

    Hi All,
    I'm quite new to Pig/Hadoop. So maybe my cluster size will
    make
    you
    laugh.

    I wrote a script on Pig handling 1.5GB of logs in less than
    one
    hour
    in
    pig
    local mode on a Intel core 2 duo with 3GB of RAM.

    Then I tried this script on a simple 2 nodes cluster. These
    2
    nodes
    are
    not
    servers but simple computers:
    - Intel core 2 duo with 3GB of RAM.
    - Intel Quad with 4GB of RAM.

    Well I was aware that hadoop has overhead and that it won't
    be
    done
    in
    half
    an hour (time in local divided by number of nodes). But I
    was
    surprised
    to
    see this morning it took 7 hours to complete!!!

    My configuration was made according to this link:







    http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29

    My question is simple: Is it normal?

    Cheers


    Vincent


    --
    Best Regards
    Jeff Zhang

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 8, '10 at 6:53a
activeOct 8, '10 at 6:31p
posts19
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase