Grokbase Groups Pig user January 2011
FAQ
Hi all,

I'm running a Pig script in local mode, and it finishes successfully. When I use the same dataset and script to run pig in its distributed mode, it hangs at 90% and the hadoop processes in the node machines takes almost all the memory. It always hangs at the reduce task of the last job.

The conf/mapred-site.xml is:

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1000m</value>
</property>
<property>
<name>mapred.child.ulimit</name>
<value>4000000</value>
<final>true</final>
</property>

Do you know how I can debug the processes to find out where the problem is?

Thanks!

Search Discussions

  • Jacob at Jan 26, 2011 at 4:07 pm
    Martin,
    When you look at the task logs for the particular reducer that's stuck,
    what do you see? What kind of operations do you have going on in the
    script, possibly a GROUP ALL?

    --jacob
    On Wed, 2011-01-26 at 12:44 -0300, Martin Z wrote:
    Hi all,

    I'm running a Pig script in local mode, and it finishes successfully. When I use the same dataset and script to run pig in its distributed mode, it hangs at 90% and the hadoop processes in the node machines takes almost all the memory. It always hangs at the reduce task of the last job.

    The conf/mapred-site.xml is:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1000m</value>
    </property>
    <property>
    <name>mapred.child.ulimit</name>
    <value>4000000</value>
    <final>true</final>
    </property>

    Do you know how I can debug the processes to find out where the problem is?

    Thanks!
  • Martin Z at Jan 26, 2011 at 4:33 pm
    Hi Jacob,

    Thanks for your response. These are the last lines of the task log, before killing the process. The machine is 192.168.1.18.

    2011-01-26 11:14:32,485 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 0.00 MB/s)
    2011-01-26 11:14:32,623 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 0.00 MB/s)
    2011-01-26 11:14:32,670 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 0.00 MB/s)
    2011-01-26 11:14:32,814 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000006_0 0.6666667% reduce > reduce
    2011-01-26 11:14:32,971 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000003_0 0.6666667% reduce > reduce
    2011-01-26 11:14:33,639 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000000_0 0.6666667% reduce > reduce
    2011-01-26 11:14:33,966 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 17393 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000001_0 given 17393/17389
    2011-01-26 11:14:33,966 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 17393, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000001_0, duration: 2439516
    2011-01-26 11:14:34,065 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 8246 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000004_0 given 8246/8242
    2011-01-26 11:14:34,066 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 8246, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000004_0, duration: 1218536
    2011-01-26 11:14:34,084 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 11992 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000006_0 given 11992/11988
    2011-01-26 11:14:34,085 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 11992, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000006_0, duration: 1474773
    2011-01-26 11:14:34,092 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 7195 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000008_0 given 7195/7191
    2011-01-26 11:14:34,092 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 7195, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000008_0, duration: 1679538
    2011-01-26 11:14:34,113 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 13086 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000001_0 given 13086/13082
    2011-01-26 11:14:34,113 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 13086, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000001_0, duration: 1596605
    2011-01-26 11:14:34,288 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 15422 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000004_0 given 15422/15418
    2011-01-26 11:14:34,288 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 15422, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000004_0, duration: 1191562
    2011-01-26 11:14:34,310 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 8648 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000006_0 given 8648/8644
    2011-01-26 11:14:34,311 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 8648, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000006_0, duration: 1486513
    2011-01-26 11:14:34,351 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 8181 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000008_0 given 8181/8177
    2011-01-26 11:14:34,352 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 8181, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000008_0, duration: 1530920
    2011-01-26 11:14:35,617 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.6666667% reduce > reduce
    2011-01-26 11:14:36,083 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000006_0 0.6666667% reduce > reduce
    2011-01-26 11:14:38,625 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.6666667% reduce > reduce


    It then hangs at 66%.

    About the script, is there a way of telling in which part of it the process is hanging?


    Thanks!
    Subject: Re: Tips for debugging pig
    From: jacob.a.perkins@gmail.com
    To: user@pig.apache.org
    Date: Wed, 26 Jan 2011 10:08:08 -0600

    Martin,
    When you look at the task logs for the particular reducer that's stuck,
    what do you see? What kind of operations do you have going on in the
    script, possibly a GROUP ALL?

    --jacob
    On Wed, 2011-01-26 at 12:44 -0300, Martin Z wrote:
    Hi all,

    I'm running a Pig script in local mode, and it finishes successfully. When I use the same dataset and script to run pig in its distributed mode, it hangs at 90% and the hadoop processes in the node machines takes almost all the memory. It always hangs at the reduce task of the last job.

    The conf/mapred-site.xml is:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1000m</value>
    </property>
    <property>
    <name>mapred.child.ulimit</name>
    <value>4000000</value>
    <final>true</final>
    </property>

    Do you know how I can debug the processes to find out where the problem is?

    Thanks!
  • Jacob at Jan 26, 2011 at 5:03 pm
    Martin,

    It's a bit of a black art at the moment. Every Pig script is broken down
    into one or more map-reduce jobs based on the types of operations you've
    got in there. JOIN, GROUP, COGROUP, and ORDER will require a reduce
    (except for special circumstances that you must specify ahead of time).
    If your script only has one operation that kicks off a reduce then it
    has to be that operation that's the problem.

    At the phase you're looking at (~66%) is where the partition and sort
    have finished and you're running the actual reduce portion of your task.
    Keep in mind that, at this phase, output data is finally being written
    to the hdfs. Do you have write permissions where you're trying to write?
    What do the namenode logs say? The datanode logs?


    --jacob
    On Wed, 2011-01-26 at 13:32 -0300, Martin Z wrote:








    Hi Jacob,

    Thanks for your response. These are the last lines of the task log, before killing the process. The machine is 192.168.1.18.

    2011-01-26 11:14:32,485 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 0.00 MB/s)
    2011-01-26 11:14:32,623 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 0.00 MB/s)
    2011-01-26 11:14:32,670 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.33333334% reduce > copy (12 of 12 at 0.00 MB/s)
    2011-01-26 11:14:32,814 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000006_0 0.6666667% reduce > reduce
    2011-01-26 11:14:32,971 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000003_0 0.6666667% reduce > reduce
    2011-01-26 11:14:33,639 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000000_0 0.6666667% reduce > reduce
    2011-01-26 11:14:33,966 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 17393 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000001_0 given 17393/17389
    2011-01-26 11:14:33,966 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 17393, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000001_0, duration: 2439516
    2011-01-26 11:14:34,065 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 8246 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000004_0 given 8246/8242
    2011-01-26 11:14:34,066 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 8246, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000004_0, duration: 1218536
    2011-01-26 11:14:34,084 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 11992 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000006_0 given 11992/11988
    2011-01-26 11:14:34,085 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 11992, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000006_0, duration: 1474773
    2011-01-26 11:14:34,092 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 7195 bytes for reduce: 10 from map: attempt_201101261105_0002_m_000008_0 given 7195/7191
    2011-01-26 11:14:34,092 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.19:56250, bytes: 7195, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000008_0, duration: 1679538
    2011-01-26 11:14:34,113 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 13086 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000001_0 given 13086/13082
    2011-01-26 11:14:34,113 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 13086, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000001_0, duration: 1596605
    2011-01-26 11:14:34,288 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 15422 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000004_0 given 15422/15418
    2011-01-26 11:14:34,288 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 15422, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000004_0, duration: 1191562
    2011-01-26 11:14:34,310 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 8648 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000006_0 given 8648/8644
    2011-01-26 11:14:34,311 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 8648, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000006_0, duration: 1486513
    2011-01-26 11:14:34,351 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 8181 bytes for reduce: 11 from map: attempt_201101261105_0002_m_000008_0 given 8181/8177
    2011-01-26 11:14:34,352 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.18:50060, dest: 192.168.1.14:60067, bytes: 8181, op: MAPRED_SHUFFLE, cliID: attempt_201101261105_0002_m_000008_0, duration: 1530920
    2011-01-26 11:14:35,617 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.6666667% reduce > reduce
    2011-01-26 11:14:36,083 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000006_0 0.6666667% reduce > reduce
    2011-01-26 11:14:38,625 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201101261105_0002_r_000009_0 0.6666667% reduce > reduce


    It then hangs at 66%.

    About the script, is there a way of telling in which part of it the process is hanging?


    Thanks!
    Subject: Re: Tips for debugging pig
    From: jacob.a.perkins@gmail.com
    To: user@pig.apache.org
    Date: Wed, 26 Jan 2011 10:08:08 -0600

    Martin,
    When you look at the task logs for the particular reducer that's stuck,
    what do you see? What kind of operations do you have going on in the
    script, possibly a GROUP ALL?

    --jacob
    On Wed, 2011-01-26 at 12:44 -0300, Martin Z wrote:
    Hi all,

    I'm running a Pig script in local mode, and it finishes successfully. When I use the same dataset and script to run pig in its distributed mode, it hangs at 90% and the hadoop processes in the node machines takes almost all the memory. It always hangs at the reduce task of the last job.

    The conf/mapred-site.xml is:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1000m</value>
    </property>
    <property>
    <name>mapred.child.ulimit</name>
    <value>4000000</value>
    <final>true</final>
    </property>

    Do you know how I can debug the processes to find out where the problem is?

    Thanks!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 26, '11 at 3:45p
activeJan 26, '11 at 5:03p
posts4
users2
websitepig.apache.org

2 users in discussion

Martin Z: 2 posts Jacob: 2 posts

People

Translate

site design / logo © 2022 Grokbase