FAQ
I think I am seeing a behavior in which if a mapper task fails (crashes) on one input key/value, the entire task is rescheduled and rerun, starting over again from the first input key/value even if all of the inputs preceding the troublesome input were processed successfully.

Am I correct about this or am I seeing something that isn't there?

If I am correct, what happens to the outputs of the successful duplicate map() calls? Which output key/value is the one that is sent to shuffle (and a reducer): Is it the result of the first attempt on the input in question or the result of the last attempt?

Is there any way to prevent it from recalculating those duplicate inputs other than something manual on the side like keeping a job-log of the map attempts and scanning the log at the beginning of each map() call?

Thanks.

________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com www.keithwiley.com

"I used to be with it, but then they changed what it was. Now, what I'm with
isn't it, and what's it seems weird and scary to me."
-- Abe (Grandpa) Simpson
________________________________________________________________________________

Search Discussions

  • Li ping at Dec 14, 2010 at 1:58 am
    I think the "*org.apache.hadoop.mapred.SkipBadRecords*" is you are looking
    for.


    On Tue, Dec 14, 2010 at 8:51 AM, Keith Wiley wrote:

    I think I am seeing a behavior in which if a mapper task fails (crashes) on
    one input key/value, the entire task is rescheduled and rerun, starting over
    again from the first input key/value even if all of the inputs preceding the
    troublesome input were processed successfully.

    Am I correct about this or am I seeing something that isn't there?

    If I am correct, what happens to the outputs of the successful duplicate
    map() calls? Which output key/value is the one that is sent to shuffle (and
    a reducer): Is it the result of the first attempt on the input in question
    or the result of the last attempt?

    Is there any way to prevent it from recalculating those duplicate inputs
    other than something manual on the side like keeping a job-log of the map
    attempts and scanning the log at the beginning of each map() call?

    Thanks.


    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com
    www.keithwiley.com

    "I used to be with it, but then they changed what it was. Now, what I'm
    with
    isn't it, and what's it seems weird and scary to me."
    -- Abe (Grandpa) Simpson

    ________________________________________________________________________________



    --
    -----李平
  • Keith Wiley at Dec 14, 2010 at 6:05 pm

    On Dec 13, 2010, at 17:58 , li ping wrote:

    I think the "*org.apache.hadoop.mapred.SkipBadRecords*" is you are looking
    for.

    Yes, I considered that at one point. I don't like how it insists on iteratively retrying the records. I wish it would simply skip the failed records and move on, just run the list of input records in a line, skipping the bad ones, sending the good ones to the reducer, and otherwise making no further attempts at processing.

    I'll read up on it again. Perhaps I missed something.

    Thanks.

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com www.keithwiley.com

    "What I primarily learned in grad school is how much I *don't* know.
    Consequently, I left grad school with a higher ignorance to knowledge ratio than
    when I entered."
    -- Keith Wiley
    ________________________________________________________________________________
  • 蔡超 at Dec 14, 2010 at 4:20 am
    I have met this problem. I think the behavior (whether start from the very
    begining, whether override duplicate keys) depends on the inputformat and
    outputformat. When I use DBInputFormat and DBOutputFormat, it will restart
    for failed task rather than the very begining.

    Hope to help. I want to make the mechanism clearer, too.


    Cai Chao
    On Tue, Dec 14, 2010 at 8:51 AM, Keith Wiley wrote:

    I think I am seeing a behavior in which if a mapper task fails (crashes) on
    one input key/value, the entire task is rescheduled and rerun, starting over
    again from the first input key/value even if all of the inputs preceding the
    troublesome input were processed successfully.

    Am I correct about this or am I seeing something that isn't there?

    If I am correct, what happens to the outputs of the successful duplicate
    map() calls? Which output key/value is the one that is sent to shuffle (and
    a reducer): Is it the result of the first attempt on the input in question
    or the result of the last attempt?

    Is there any way to prevent it from recalculating those duplicate inputs
    other than something manual on the side like keeping a job-log of the map
    attempts and scanning the log at the beginning of each map() call?

    Thanks.


    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com
    www.keithwiley.com

    "I used to be with it, but then they changed what it was. Now, what I'm
    with
    isn't it, and what's it seems weird and scary to me."
    -- Abe (Grandpa) Simpson

    ________________________________________________________________________________


  • Eric Sammer at Dec 14, 2010 at 5:46 am
    What you are seeing is correct and the intended behavior. The unit of work
    in a MR job is the task. If something causes the task to fail, it starts
    again. Any output from the failed task attempt is throw away. The reducers
    will not see the output of the failed map tasks at all. There is no way
    (within Hadoop proper) to teach a task to be stateful, nor should you as you
    lose a lot of flexibility with respect to features like speculative
    execution and the ability to deal with failures of the machine (unless you
    maintained task state in HDFS or another external system). It's just not
    worth.
    On Mon, Dec 13, 2010 at 7:51 PM, Keith Wiley wrote:

    I think I am seeing a behavior in which if a mapper task fails (crashes) on
    one input key/value, the entire task is rescheduled and rerun, starting over
    again from the first input key/value even if all of the inputs preceding the
    troublesome input were processed successfully.

    Am I correct about this or am I seeing something that isn't there?

    If I am correct, what happens to the outputs of the successful duplicate
    map() calls? Which output key/value is the one that is sent to shuffle (and
    a reducer): Is it the result of the first attempt on the input in question
    or the result of the last attempt?

    Is there any way to prevent it from recalculating those duplicate inputs
    other than something manual on the side like keeping a job-log of the map
    attempts and scanning the log at the beginning of each map() call?

    Thanks.


    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com
    www.keithwiley.com

    "I used to be with it, but then they changed what it was. Now, what I'm
    with
    isn't it, and what's it seems weird and scary to me."
    -- Abe (Grandpa) Simpson

    ________________________________________________________________________________



    --
    Eric Sammer
    twitter: esammer
    data: www.cloudera.com
  • Keith Wiley at Dec 14, 2010 at 5:16 pm
    Hmmm, I'll take that under advisement. So, even if I manually avoided redoing earlier work (by keeping a log of which input key/values have been processed and short-circuiting the map() if a key/value has already been processed, you're saying those previously completed key/values would not be passed on the reducer if I skipped them the second time the task was attempted? Is that correct?

    Man, I'm trying to figure out the best design here.

    My mapper can take up to an hour to process a single input key/value. If a mapper fails on the second input, I really can't afford to calculate the first input all over again even though it was successful the first time. The job basically never finishes at that rate of inefficiency. Reprocessing any data even twice is basically unacceptable, much less four times which the number of times a task is attempted before giving up and letting the reducer work with what it's got (I've tried setMaxMapAttempts(), but it has no affect, tasks are always attempted four times regardless of setMaxMapAttempts().).

    I wish there were a less burdensome version of skipbadrecords. I don't want it to perform a binary search trying to find the bad record while reprocessing data over and over again. I want it to just skip failed calls to map() and move on to the next input key/value. I want the mapper to just iterate through its list of inputs, skipping any that fail, and sending all the successfully processed data to the reducer, all in a single nonredundant pass. Is there any way to make Hadoop do that?

    Thanks.

    Cheers!
    On Dec 13, 2010, at 21:46 , Eric Sammer wrote:

    What you are seeing is correct and the intended behavior. The unit of work
    in a MR job is the task. If something causes the task to fail, it starts
    again. Any output from the failed task attempt is throw away. The reducers
    will not see the output of the failed map tasks at all. There is no way
    (within Hadoop proper) to teach a task to be stateful, nor should you as you
    lose a lot of flexibility with respect to features like speculative
    execution and the ability to deal with failures of the machine (unless you
    maintained task state in HDFS or another external system). It's just not
    worth.
    On Mon, Dec 13, 2010 at 7:51 PM, Keith Wiley wrote:

    I think I am seeing a behavior in which if a mapper task fails (crashes) on
    one input key/value, the entire task is rescheduled and rerun, starting over
    again from the first input key/value even if all of the inputs preceding the
    troublesome input were processed successfully.

    Am I correct about this or am I seeing something that isn't there?

    If I am correct, what happens to the outputs of the successful duplicate
    map() calls? Which output key/value is the one that is sent to shuffle (and
    a reducer): Is it the result of the first attempt on the input in question
    or the result of the last attempt?

    Is there any way to prevent it from recalculating those duplicate inputs
    other than something manual on the side like keeping a job-log of the map
    attempts and scanning the log at the beginning of each map() call?

    Thanks.

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com www.keithwiley.com

    "Luminous beings are we, not this crude matter."
    -- Yoda
    ________________________________________________________________________________
  • Harsh J at Dec 14, 2010 at 5:31 pm
    Hi,
    On Tue, Dec 14, 2010 at 10:43 PM, Keith Wiley wrote:
    I wish there were a less burdensome version of skipbadrecords.  I don't want it to perform a binary search trying to find the bad record while reprocessing data over and over again.  I want it to just skip failed calls to map() and move on to the next input key/value.  I want the mapper to just iterate through its list of inputs, skipping any that fail, and sending all the successfully processed data to the reducer, all in a single nonredundant pass.  Is there any way to make Hadoop do that?
    You could do this with your application Mapper code, "catch" bad
    records [try-fail-continue kind of a thing] and push them to a
    different output file rather than the default collector that goes to
    the Reducer [MultipleOutputs, etc. help here] for reprocessing or
    inspection later. Is it not that simple?

    --
    Harsh J
    www.harshj.com
  • Keith Wiley at Dec 14, 2010 at 6:05 pm

    On Dec 14, 2010, at 09:30 , Harsh J wrote:

    Hi,
    On Tue, Dec 14, 2010 at 10:43 PM, Keith Wiley wrote:
    I wish there were a less burdensome version of skipbadrecords. I don't want it to perform a binary search trying to find the bad record while reprocessing data over and over again. I want it to just skip failed calls to map() and move on to the next input key/value. I want the mapper to just iterate through its list of inputs, skipping any that fail, and sending all the successfully processed data to the reducer, all in a single nonredundant pass. Is there any way to make Hadoop do that?
    You could do this with your application Mapper code, "catch" bad
    records [try-fail-continue kind of a thing] and push them to a
    different output file rather than the default collector that goes to
    the Reducer [MultipleOutputs, etc. help here] for reprocessing or
    inspection later. Is it not that simple?

    I'm not sure I understand, but if you are suggesting that I detect the troublesome records through simple try/catch exception handlers, then I'm afraid that won't work. My code is already as resilient as I can possibly make it from that point of view. The task failures are occurring in C++ code which is being run via JNI from the mappers. Despite copious use of exception handlers both in Java and in C++, it is inevitable -- as per the nature of C++ or any other native compiled code -- that some kinds of errors will simply be untrappable. I have been unsuccessful in trapping some of the errors I am facing. The job tracker reports task failures with standard failure status codes (134 and 139 in my case). It's obvious that the native code is simply crashing in some fashion, but I can't figure out how to get Hadoop to gracefully handle the situation.

    ________________________________________________________________________________
    Keith Wiley kwiley@keithwiley.com www.keithwiley.com

    "The easy confidence with which I know another man's religion is folly teaches
    me to suspect that my own is also."
    -- Mark Twain
    ________________________________________________________________________________

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 14, '10 at 12:51a
activeDec 14, '10 at 6:05p
posts8
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase