FAQ
I'm running a big job on my cluster and a handful of attempts are failing
with a "Too many fetch-failures" error message. They're all on the same
node, but that node doesn't appear to be down. Subsequent attempts succeed,
so this looks like a transient stress issue rather than a problem with my
code. I'm guessing it's something like HDFS not being able to keep up, but
I'm not sure, and Googling only turns up people just as confused as I am.

What does this error mean and how do I dig into it more?

Thanks.

Search Discussions

  • David Rosenstrauch at Mar 31, 2011 at 10:05 pm

    On 03/31/2011 05:13 PM, W.P. McNeill wrote:
    I'm running a big job on my cluster and a handful of attempts are failing
    with a "Too many fetch-failures" error message. They're all on the same
    node, but that node doesn't appear to be down. Subsequent attempts succeed,
    so this looks like a transient stress issue rather than a problem with my
    code. I'm guessing it's something like HDFS not being able to keep up, but
    I'm not sure, and Googling only turns up people just as confused as I am.

    What does this error mean and how do I dig into it more?

    Thanks.
    We've seen that happen in a number of situations, and it's a bit tricky
    to debug.

    In the general sense it means that a machine wasn't able to fetch a
    block from HDFS - i.e., there was a network problem that prevented the
    machine from communicating with the other machine and fetch the block.
    The reasons why this could happen though are numerous. We've seen this
    in at least 2 situations: 1) the HDFS machine was having a huge load
    spike and so didn't respond, and 2) we accidentally gave several nodes
    the same name, so Hadoop wasn't able to correctly contact the "real"
    node for that name.

    Your specific issue may be different, though, so you'll need to debug
    the network error yourself.

    HTH,

    DR

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 31, '11 at 9:21p
activeMar 31, '11 at 10:05p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase