FAQ
Hi,

I am wondering if anyone has experienced this problem. Sometimes when I
ran a job, a few map tasks (often just one) hang in the initializing phase
for more than 3 minutes (it normally finishes in a couple seconds). They
will eventually finish, but the whole job is slowed down considerably. The
weird thing is that the slow task is not deterministic. It doesn't always
occur and if does, can occur on any split and on any host.

I'd appreciate any help on understanding this.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

junrao@almaden.ibm.com
(408)927-1886 (phone)
(408)927-3215 (fax)

Search Discussions

  • Doug Cutting at Jun 21, 2007 at 4:21 pm

    Jun Rao wrote:
    I am wondering if anyone has experienced this problem. Sometimes when I
    ran a job, a few map tasks (often just one) hang in the initializing phase
    for more than 3 minutes (it normally finishes in a couple seconds). They
    will eventually finish, but the whole job is slowed down considerably. The
    weird thing is that the slow task is not deterministic. It doesn't always
    occur and if does, can occur on any split and on any host.
    I have not seen this.

    Perhaps you can get a stack trace from the tasktracker while this is
    happening?

    Owen described how to get such stack traces in:

    http://mail-archives.apache.org/mod_mbox/lucene-hadoop-user/200706.mbox/%3c3AB557D4-4B71-4286-BB36-1A449F28BAD5@yahoo-inc.com%3e

    Owen wrote:
    One side note is that all of the servers have a servlet such that if
    you do http://<node>:<port>/stacks you'll get a stack trace of all
    the threads in the server. I find that useful for remote debugging.
    *smile* Although if it is a task jvm that has the problem, then there
    isn't a server for them.
    (This should probably be added to the documentation or the wiki...)

    Doug
  • Raghu Angadi at Jun 21, 2007 at 5:20 pm

    Doug Cutting wrote:
    Owen wrote:
    One side note is that all of the servers have a servlet such that if
    you do http://<node>:<port>/stacks you'll get a stack trace of all
    the threads in the server. I find that useful for remote debugging.
    *smile* Although if it is a task jvm that has the problem, then there
    isn't a server for them.
    (This should probably be added to the documentation or the wiki...)
    We should. This is very useful. I have been working on Hadoop for months
    but still didn't know.

    Raghu.
    Doug
  • Doug Cutting at Jun 21, 2007 at 5:45 pm

    Raghu Angadi wrote:
    Doug Cutting wrote:
    Owen wrote:
    One side note is that all of the servers have a servlet such that if
    you do http://<node>:<port>/stacks you'll get a stack trace of all
    the threads in the server. I find that useful for remote debugging.
    *smile* Although if it is a task jvm that has the problem, then
    there isn't a server for them.
    (This should probably be added to the documentation or the wiki...)
    We should. This is very useful. I have been working on Hadoop for months
    but still didn't know.
    A good place might be on:

    http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

    Doug
  • Jun Rao at Jul 6, 2007 at 7:29 am
    I enabled logging. The slow map task was caused when making a socket
    connection call in setupIOstreams()(triggered by the first RPC
    call--getProtocolVersion()--from MapTask to TaskTracker). If the socket
    connection call was made at t1, the call didn't return until t1 + ~200
    seconds (normally, each Map task takes about 8 seconds). At the RPC server
    side, doAccept() was also called at t1 + ~200 seconds. I was running a Job
    with 200+ splits 10 times. On average, there was one slow map task per run
    (all slow Map tasks took ~200 seconds to make the socket connection). I
    was using a recent 64-bit IBM JVM on SuSe.

    Jun
    IBM Almaden Research Center
    K55/B1, 650 Harry Road, San Jose, CA 95120-6099

    junrao@almaden.ibm.com
    (408)927-1886 (phone)
    (408)927-3215 (fax)




    Doug Cutting <cutting@apache.org>
    06/21/2007 09:21 AM
    Please respond to
    hadoop-user@lucene.apache.org


    To
    hadoop-user@lucene.apache.org
    cc

    Subject
    Re: map task in initializing phase for too long






    Jun Rao wrote:
    I am wondering if anyone has experienced this problem. Sometimes when I
    ran a job, a few map tasks (often just one) hang in the initializing phase
    for more than 3 minutes (it normally finishes in a couple seconds). They
    will eventually finish, but the whole job is slowed down considerably. The
    weird thing is that the slow task is not deterministic. It doesn't always
    occur and if does, can occur on any split and on any host.
    I have not seen this.

    Perhaps you can get a stack trace from the tasktracker while this is
    happening?

    Owen described how to get such stack traces in:

    http://mail-archives.apache.org/mod_mbox/lucene-hadoop-user/200706.mbox/%3c3AB557D4-4B71-4286-BB36-1A449F28BAD5@yahoo-inc.com%3e


    Owen wrote:
    One side note is that all of the servers have a servlet such that if
    you do http://<node>:<port>/stacks you'll get a stack trace of all
    the threads in the server. I find that useful for remote debugging.
    *smile* Although if it is a task jvm that has the problem, then there
    isn't a server for them.
    (This should probably be added to the documentation or the wiki...)

    Doug

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 21, '07 at 12:21a
activeJul 6, '07 at 7:29a
posts5
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase