FAQ
Hi,

I am new to hadoop. I have a set of files and I want to assign each file to
a mapper. Also in mapper there should be a way to know the complete path of
the file. Can you please tell me how to do that ?

Thanks,
Govind

--
Govind Kothari
Graduate Student
Dept. of Computer Science
University of Maryland College Park

<---Seek Excellence, Success will Follow --->

Search Discussions

  • Mark question at Jul 5, 2011 at 9:23 pm
    Hi Govind,

    You should use overwrite your FileInputFormat isSplitable function in a
    class say myFileInputFormat extends FileInputFormat as follows:

    @Override
    public boolean isSplitable(FileSystem fs, Path filename){
    return false;
    }

    Then one you use your myFileInputFormat class. To know the path, write the
    following in your mapper class:

    @Override
    public void configure(JobConf job) {

    Path inputPath = new Path(job.get("map.input.file"));

    }

    ~cheers,

    Mark
    On Tue, Jul 5, 2011 at 1:04 PM, Govind Kothari wrote:

    Hi,

    I am new to hadoop. I have a set of files and I want to assign each file to
    a mapper. Also in mapper there should be a way to know the complete path of
    the file. Can you please tell me how to do that ?

    Thanks,
    Govind

    --
    Govind Kothari
    Graduate Student
    Dept. of Computer Science
    University of Maryland College Park

    <---Seek Excellence, Success will Follow --->
  • Jim Falgout at Jul 5, 2011 at 9:28 pm
    I've done this before by placing the name of each file to process into a single file (newline separated) and using the NLineInputFormat class as the input format. Run your job with the single file with all of the file names to process as the input. Each mapper will then be handed one line (this is tunable) from the single input file. The line will contain the name of the file to process.

    You can also write your own InputFormat class that creates a split for each file.

    Both of these options have scalability issues which begs the question: why one file per mapper?

    -----Original Message-----
    From: Govind Kothari
    Sent: Tuesday, July 05, 2011 3:04 PM
    To: common-user@hadoop.apache.org
    Subject: One file per mapper

    Hi,

    I am new to hadoop. I have a set of files and I want to assign each file to a mapper. Also in mapper there should be a way to know the complete path of the file. Can you please tell me how to do that ?

    Thanks,
    Govind

    --
    Govind Kothari
    Graduate Student
    Dept. of Computer Science
    University of Maryland College Park

    <---Seek Excellence, Success will Follow --->
  • Edward Capriolo at Jul 6, 2011 at 2:47 pm

    On Tue, Jul 5, 2011 at 5:28 PM, Jim Falgout wrote:

    I've done this before by placing the name of each file to process into a
    single file (newline separated) and using the NLineInputFormat class as the
    input format. Run your job with the single file with all of the file names
    to process as the input. Each mapper will then be handed one line (this is
    tunable) from the single input file. The line will contain the name of the
    file to process.

    You can also write your own InputFormat class that creates a split for each
    file.

    Both of these options have scalability issues which begs the question: why
    one file per mapper?

    -----Original Message-----
    From: Govind Kothari
    Sent: Tuesday, July 05, 2011 3:04 PM
    To: common-user@hadoop.apache.org
    Subject: One file per mapper

    Hi,

    I am new to hadoop. I have a set of files and I want to assign each file to
    a mapper. Also in mapper there should be a way to know the complete path of
    the file. Can you please tell me how to do that ?

    Thanks,
    Govind

    --
    Govind Kothari
    Graduate Student
    Dept. of Computer Science
    University of Maryland College Park

    <---Seek Excellence, Success will Follow --->
    You can also do this with MultipleInputs and MultipleOutputs classes. Each
    source file can have a different mapper.
  • Govind at Aug 10, 2011 at 4:04 pm
    Actually I tried this way of putting the paths of each file in the single
    file and providing it as input. But the problem is mapper is not able to
    identify the path and giving an error of file not found. I am not sure why
    is this so.

    --
    View this message in context: http://lucene.472066.n3.nabble.com/One-file-per-mapper-tp3142514p3243038.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 5, '11 at 9:15p
activeAug 10, '11 at 4:04p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase