FAQ
Hi, all

I'm not sure which mail list I should send my question to, sorry for any
inconvenience I brought

I'm interested in that how hadoop handles the lost of intermediate data
generated by map tasks currently, as some papers suggest, for the situation
that the data needed by reducers are lost, we should compare the cost
leading by redo the task and replicating the data, if redoing the task costs
more, we can offer more replication of the intermediate data generated by
map to ensure that reducers can access the data, otherwise, we just redo the
corresponding map task when we detect the lost

I'm not sure what's the strategy adopted by hadoop currently, I haven't find
the code on this function, can anyone give me some suggestions?

Thank you

Nan

Search Discussions

  • Newpant at Oct 13, 2010 at 8:07 am
    Hi, according to Hadoop The Definitive Guide , map will store the
    intermediate output to a in-memory buffer first, and the spill it to local
    disk which configured by mapred.local.dir, so from i knew, if the
    intermediate data lost , only redo can fix it.

    if i wrong, please correct me.

    2010/9/27 Nan Zhu <zhunansjtu@gmail.com>
    Hi, all

    I'm not sure which mail list I should send my question to, sorry for any
    inconvenience I brought

    I'm interested in that how hadoop handles the lost of intermediate data
    generated by map tasks currently, as some papers suggest, for the
    situation
    that the data needed by reducers are lost, we should compare the cost
    leading by redo the task and replicating the data, if redoing the task
    costs
    more, we can offer more replication of the intermediate data generated by
    map to ensure that reducers can access the data, otherwise, we just redo
    the
    corresponding map task when we detect the lost

    I'm not sure what's the strategy adopted by hadoop currently, I haven't
    find
    the code on this function, can anyone give me some suggestions?

    Thank you

    Nan
  • Nan Zhu at Oct 13, 2010 at 3:05 pm
    yes, I finally find the corresponding codes

    it's in TaskTracker.MapOutputServelet,
    doGet()->sendMapFile()->TaskTracker.MapOutputLost()

    it's true that the hadoop use redo strategy to solve this problem , but for
    some papers, it indicates that we can also replicate the intermediate result
    to make it fault-tolerance

    Thank you very much

    Nan
    On Wed, Oct 13, 2010 at 4:07 PM, newpant wrote:

    Hi, according to Hadoop The Definitive Guide , map will store the
    intermediate output to a in-memory buffer first, and the spill it to local
    disk which configured by mapred.local.dir, so from i knew, if the
    intermediate data lost , only redo can fix it.

    if i wrong, please correct me.

    2010/9/27 Nan Zhu <zhunansjtu@gmail.com>
    Hi, all

    I'm not sure which mail list I should send my question to, sorry for any
    inconvenience I brought

    I'm interested in that how hadoop handles the lost of intermediate data
    generated by map tasks currently, as some papers suggest, for the
    situation
    that the data needed by reducers are lost, we should compare the cost
    leading by redo the task and replicating the data, if redoing the task
    costs
    more, we can offer more replication of the intermediate data generated by
    map to ensure that reducers can access the data, otherwise, we just redo
    the
    corresponding map task when we detect the lost

    I'm not sure what's the strategy adopted by hadoop currently, I haven't
    find
    the code on this function, can anyone give me some suggestions?

    Thank you

    Nan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 27, '10 at 5:36a
activeOct 13, '10 at 3:05p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Nan Zhu: 2 posts Newpant: 1 post

People

Translate

site design / logo © 2022 Grokbase