FAQ
Hi,

We upgraded our code to nutch 0.9 stable version along with hadoop 0.12.3,
which is the latest version of hadoop 0.12.

After the upgrade, I am seeing task failures during the reduce phase for
parse and fetch (without the parsing option) sometimes.

Usually, it's just one reduce task that creates this problem. The
jobtracker kills this task saying "Task failed to report status for 602
seconds. Killing task"

I tried running the task using IsolationRunner, and it works fine. I am
suspecting that there is probably a long computation happening during the
reduce phase for one of the keys due to which the tasktracker isn't able to
report status to the jobtracker in time.

I was wondering if anyone else has seen a similar problem and if there is
a fix for it.

Thanks,

-vishal.

Search Discussions

  • Arun C Murthy at May 22, 2007 at 10:54 am

    Vishal Shah wrote:
    Hi,

    We upgraded our code to nutch 0.9 stable version along with hadoop 0.12.3,
    which is the latest version of hadoop 0.12.

    After the upgrade, I am seeing task failures during the reduce phase for
    parse and fetch (without the parsing option) sometimes.

    Usually, it's just one reduce task that creates this problem. The
    jobtracker kills this task saying "Task failed to report status for 602
    seconds. Killing task"

    I tried running the task using IsolationRunner, and it works fine. I am
    suspecting that there is probably a long computation happening during the
    reduce phase for one of the keys due to which the tasktracker isn't able to
    report status to the jobtracker in time.
    If you suspect the long computation one way is to use the 'reporter'
    parameter to your mapper/reducer to provide status updates and ensure
    that the TaskTracker doesn't kill the task i.e. doesn't assume the task
    has been lost.

    hth,
    Arun
    I was wondering if anyone else has seen a similar problem and if there is
    a fix for it.

    Thanks,

    -vishal.
  • Vishal Shah at May 23, 2007 at 8:44 am
    Hi Arun,

    Thanks for the reply. We figured out the root cause of the problem. We are
    not using the hadoop native libs right now, and the Sun Java Deflater hangs
    sometimes during the reduce phase. Our glibc version is 2.3.5, where as the
    hadoop native libs need 2.4, that's why they are not being used by hadoop.

    I was wondering if there is a version of the native libs that would work
    with glibc 2.3. If not, we'll have to upgrade the glibc version on all
    machines to 2.4.

    Regards,

    -vishal.

    -----Original Message-----
    From: Arun C Murthy
    Sent: Tuesday, May 22, 2007 4:24 PM
    To: [email protected]
    Cc: [email protected]
    Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3

    Vishal Shah wrote:
    Hi,

    We upgraded our code to nutch 0.9 stable version along with hadoop 0.12.3,
    which is the latest version of hadoop 0.12.

    After the upgrade, I am seeing task failures during the reduce phase for
    parse and fetch (without the parsing option) sometimes.

    Usually, it's just one reduce task that creates this problem. The
    jobtracker kills this task saying "Task failed to report status for 602
    seconds. Killing task"

    I tried running the task using IsolationRunner, and it works fine. I am
    suspecting that there is probably a long computation happening during the
    reduce phase for one of the keys due to which the tasktracker isn't able to
    report status to the jobtracker in time.
    If you suspect the long computation one way is to use the 'reporter'
    parameter to your mapper/reducer to provide status updates and ensure
    that the TaskTracker doesn't kill the task i.e. doesn't assume the task
    has been lost.

    hth,
    Arun
    I was wondering if anyone else has seen a similar problem and if there is
    a fix for it.

    Thanks,

    -vishal.
  • Arun C Murthy at May 23, 2007 at 6:11 pm
    Vishal,
    On Wed, May 23, 2007 at 02:15:50PM +0530, Vishal Shah wrote:
    Hi Arun,

    Thanks for the reply. We figured out the root cause of the problem. We are
    not using the hadoop native libs right now, and the Sun Java Deflater hangs
    sometimes during the reduce phase. Our glibc version is 2.3.5, where as the
    hadoop native libs need 2.4, that's why they are not being used by hadoop.
    Personally I've never seen Sun's Deflater hang... but ymmv.

    Anyway, there is nothing in native hadoop libs which need glibc-2.4 - looks like this is just an artifact of the fact that the machine on which the release was cut (i.e. the native libs in 0.12.3 were built) had glibc-2.4. I have glibb-2.3.6-r3 on my machine and things work fine...
    I was wondering if there is a version of the native libs that would work
    with glibc 2.3.
    I don't have access right-away to a box with glibc-2.3.5, but it's really easy to build them yourself - details here:
    http://wiki.apache.org/lucene-hadoop/NativeHadoop

    hth,
    Arun
    If not, we'll have to upgrade the glibc version on all
    machines to 2.4.

    Regards,

    -vishal.

    -----Original Message-----
    From: Arun C Murthy
    Sent: Tuesday, May 22, 2007 4:24 PM
    To: [email protected]
    Cc: [email protected]
    Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3

    Vishal Shah wrote:
    Hi,

    We upgraded our code to nutch 0.9 stable version along with hadoop 0.12.3,
    which is the latest version of hadoop 0.12.

    After the upgrade, I am seeing task failures during the reduce phase for
    parse and fetch (without the parsing option) sometimes.

    Usually, it's just one reduce task that creates this problem. The
    jobtracker kills this task saying "Task failed to report status for 602
    seconds. Killing task"

    I tried running the task using IsolationRunner, and it works fine. I am
    suspecting that there is probably a long computation happening during the
    reduce phase for one of the keys due to which the tasktracker isn't able to
    report status to the jobtracker in time.
    If you suspect the long computation one way is to use the 'reporter'
    parameter to your mapper/reducer to provide status updates and ensure
    that the TaskTracker doesn't kill the task i.e. doesn't assume the task
    has been lost.

    hth,
    Arun
    I was wondering if anyone else has seen a similar problem and if there is
    a fix for it.

    Thanks,

    -vishal.
  • Vishal Shah at May 24, 2007 at 12:30 pm
    Thanks for the reply Arun. We've recompiled the hadoop native binaries and
    they seem to be loading fine. We are rerunning the job to see if it works
    now.

    Regards,

    -vishal.

    -----Original Message-----
    From: Arun C Murthy
    Sent: Wednesday, May 23, 2007 11:40 PM
    To: [email protected]; [email protected]
    Cc: [email protected]
    Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3

    Vishal,
    On Wed, May 23, 2007 at 02:15:50PM +0530, Vishal Shah wrote:
    Hi Arun,

    Thanks for the reply. We figured out the root cause of the problem. We are
    not using the hadoop native libs right now, and the Sun Java Deflater hangs
    sometimes during the reduce phase. Our glibc version is 2.3.5, where as the
    hadoop native libs need 2.4, that's why they are not being used by hadoop.
    Personally I've never seen Sun's Deflater hang... but ymmv.

    Anyway, there is nothing in native hadoop libs which need glibc-2.4 - looks
    like this is just an artifact of the fact that the machine on which the
    release was cut (i.e. the native libs in 0.12.3 were built) had glibc-2.4. I
    have glibb-2.3.6-r3 on my machine and things work fine...
    I was wondering if there is a version of the native libs that would work
    with glibc 2.3.
    I don't have access right-away to a box with glibc-2.3.5, but it's really
    easy to build them yourself - details here:
    http://wiki.apache.org/lucene-hadoop/NativeHadoop

    hth,
    Arun
    If not, we'll have to upgrade the glibc version on all
    machines to 2.4.

    Regards,

    -vishal.

    -----Original Message-----
    From: Arun C Murthy
    Sent: Tuesday, May 22, 2007 4:24 PM
    To: [email protected]
    Cc: [email protected]
    Subject: Re: Reduce task hangs when using nutch 0.9 with hadoop 0.12.3

    Vishal Shah wrote:
    Hi,

    We upgraded our code to nutch 0.9 stable version along with hadoop 0.12.3,
    which is the latest version of hadoop 0.12.

    After the upgrade, I am seeing task failures during the reduce phase
    for
    parse and fetch (without the parsing option) sometimes.

    Usually, it's just one reduce task that creates this problem. The
    jobtracker kills this task saying "Task failed to report status for 602
    seconds. Killing task"

    I tried running the task using IsolationRunner, and it works fine. I am
    suspecting that there is probably a long computation happening during the
    reduce phase for one of the keys due to which the tasktracker isn't able to
    report status to the jobtracker in time.
    If you suspect the long computation one way is to use the 'reporter'
    parameter to your mapper/reducer to provide status updates and ensure
    that the TaskTracker doesn't kill the task i.e. doesn't assume the task
    has been lost.

    hth,
    Arun
    I was wondering if anyone else has seen a similar problem and if there is
    a fix for it.

    Thanks,

    -vishal.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 22, '07 at 10:49a
activeMay 24, '07 at 12:30p
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Vishal Shah: 3 posts Arun C Murthy: 2 posts

People

Translate

site design / logo © 2023 Grokbase