Grokbase Groups Pig user May 2010
FAQ
Hi,

I often get this error message when executing a Join over big data (~ 160 GB):

"Task attempt failed to report status for 602 seconds. Killing!"

The job finally finishes but a lot of reduce tasks are killed with this error message.
I execute the JOIN with a PARALLEL statement of 9.
Finally all 9 reduces succeed but there are also, for example, 13 Failed Taks attempts.
This also causes the execution time to get very slow!

Does anybody have an idea what's happening or have the same problem?

Thx in advance,
Alex

Search Discussions

  • 김영우 at May 20, 2010 at 8:32 am
    Hi Alexander,

    Hadoop mapreduce has a 'mapred.task.timeout' property.
    http://hadoop.apache.org/common/docs/current/mapred-default.html
    In your case, I don't know exactly greater value of timeout would be useful.

    Regards,

    Youngwoo

    2010/5/20 Alexander Schätzle <alexander.schaetzle@yahoo.com>
    Hi,

    I often get this error message when executing a Join over big data (~ 160
    GB):

    "Task attempt failed to report status for 602 seconds. Killing!"

    The job finally finishes but a lot of reduce tasks are killed with this
    error message.
    I execute the JOIN with a PARALLEL statement of 9.
    Finally all 9 reduces succeed but there are also, for example, 13 Failed
    Taks attempts.
    This also causes the execution time to get very slow!

    Does anybody have an idea what's happening or have the same problem?

    Thx in advance,
    Alex

  • Rekha Joshi at May 20, 2010 at 8:40 am
    Did you try increasing the parallelism? Also at times mapred.task.timeout tuning works.If you are doing it via pig, some have reported good performance by speculative execution.
    Cheers,
    /R

    On 5/20/10 1:39 PM, "Alexander SchÀtzle" wrote:

    Hi,

    I often get this error message when executing a Join over big data (~ 160 GB):

    "Task attempt failed to report status for 602 seconds. Killing!"

    The job finally finishes but a lot of reduce tasks are killed with this error message.
    I execute the JOIN with a PARALLEL statement of 9.
    Finally all 9 reduces succeed but there are also, for example, 13 Failed Taks attempts.
    This also causes the execution time to get very slow!

    Does anybody have an idea what's happening or have the same problem?

    Thx in advance,
    Alex
  • Corbin Hoenes at May 20, 2010 at 3:01 pm
    +1 for increasing the number parallel and also try adding mapred.task.timeout to your job configuration for this particular script.

    We've had a similar problem and it helps but not sure it's going to solve the issue completely cause we still get memory problems under certain conditions.
    Try also looking at optimizing your JOIN statement using hints from the Pig Cookbook.
    On May 20, 2010, at 2:38 AM, Rekha Joshi wrote:

    Did you try increasing the parallelism? Also at times mapred.task.timeout tuning works.If you are doing it via pig, some have reported good performance by speculative execution.
    Cheers,
    /R

    On 5/20/10 1:39 PM, "Alexander SchÀtzle" wrote:

    Hi,

    I often get this error message when executing a Join over big data (~ 160 GB):

    "Task attempt failed to report status for 602 seconds. Killing!"

    The job finally finishes but a lot of reduce tasks are killed with this error message.
    I execute the JOIN with a PARALLEL statement of 9.
    Finally all 9 reduces succeed but there are also, for example, 13 Failed Taks attempts.
    This also causes the execution time to get very slow!

    Does anybody have an idea what's happening or have the same problem?

    Thx in advance,
    Alex

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 20, '10 at 8:09a
activeMay 20, '10 at 3:01p
posts4
users4
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase