FAQ
Hello,

Can a map task work on more than one input split? I am using hadoop-0.20.1
and in my map method I need to know the name of the file I am getting input
from. I use the following code to get that:

String inputFile = ((FileSplit)
context.getInputSplit()).getPath().getName();

If a map works on only one input split then I can have that code in the
setup() method which would be more efficient if I am handling large amount
of data. Otherwise, I have to put the code in the map() method. But this
would slow me down as I have to do it for every input key value pair. I have
gone through the following two pages but did not get a clear picture:

http://wiki.apache.org/hadoop/HadoopMapReduce
http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Thanks,
Farhan

Search Discussions

  • Greg Roelofs at Sep 23, 2010 at 10:18 pm
    Can a map task work on more than one input split?
    As far as I can tell from reading the code, no (at least, not yet). Code
    such as createCache() in JobInProgress implicitly assumes a one-to-one mapping
    between maps[] and splits[].

    MR-1220 (small-jobs "combo task" optimization) will change that in some sense,
    but fundamentally, the correspondence between maps and splits is pretty well
    baked in, I believe. (In fact, I'm pretty sure splits are created based on
    some goal for the number of maps--i.e., maps and splits are one-to-one almost
    by definition.)

    I might be wrong about all this, of course. :-)

    Greg
  • Alejandro Abdelnur at Sep 24, 2010 at 3:44 am
    And keep in mind that one split is not necessary 1 file. That depends on the
    InputFormat. For example, the MultipleInputFormat, clubs together multiple
    files in 1 split.

    On Thu, Sep 23, 2010 at 3:16 PM, Greg Roelofs wrote:

    Can a map task work on more than one input split?
    As far as I can tell from reading the code, no (at least, not yet). Code
    such as createCache() in JobInProgress implicitly assumes a one-to-one
    mapping
    between maps[] and splits[].

    MR-1220 (small-jobs "combo task" optimization) will change that in some
    sense,
    but fundamentally, the correspondence between maps and splits is pretty
    well
    baked in, I believe. (In fact, I'm pretty sure splits are created based on
    some goal for the number of maps--i.e., maps and splits are one-to-one
    almost
    by definition.)

    I might be wrong about all this, of course. :-)

    Greg

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 23, '10 at 8:42p
activeSep 24, '10 at 3:44a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase