FAQ
Hi all ,

I have been recently working on a task where I need to take in two input
(types) files , compare them and produce a result from it using a logic.
But as I understand simple MapReduce implementations are for processing a
single input type. The closest implementation I could think of similar to my
work is Join MapReduce. But I am not able to understand much from the
example provided in Hadoop .. Can someone provide a good pointer to such
multiple input data processing ( or Join ) in mapreduce . It will also be
great if you can send in some sample code for the same.

Thanks ,

Matthew

Search Discussions

  • Shi Yu at Oct 14, 2010 at 4:22 pm
    Hi Matthew,

    I have a same problem here (see
    http://www.listware.net/201009/hadoop-common-user/81228-return-a-parameter-using-map-only.html).
    I was planning to use join mapper (or mapper chain) to handle two
    different inputs. The problem was the mapper seems cannot return value
    directly to each other. Then I have to find out the best settings in
    heap table, MongoDB, memcached, TokyoCabinet, MapFile, etc. etc. to let
    the multiple mappers talk efficiently.

    Shi
    On 2010-10-14 9:03, Matthew John wrote:
    Hi all ,

    I have been recently working on a task where I need to take in two input
    (types) files , compare them and produce a result from it using a logic.
    But as I understand simple MapReduce implementations are for processing a
    single input type. The closest implementation I could think of similar to my
    work is Join MapReduce. But I am not able to understand much from the
    example provided in Hadoop .. Can someone provide a good pointer to such
    multiple input data processing ( or Join ) in mapreduce . It will also be
    great if you can send in some sample code for the same.

    Thanks ,

    Matthew

    --
    Postdoctoral Scholar
    Institute for Genomics and Systems Biology
    Department of Medicine, the University of Chicago
    Knapp Center for Biomedical Discovery
    900 E. 57th St. Room 10148
    Chicago, IL 60637, US
    Tel: 773-702-6799
  • Zhou, Yunqing at Jan 1, 2011 at 5:25 pm
    You can use "map.input.split"(something like that, I can't remember..) param
    in Configuration.
    this param contains the input file path, you can use it to branch your logic
    this param can be found in TextInputFormat.java

    On Thu, Oct 14, 2010 at 10:03 PM, Matthew John
    wrote:
    Hi all ,

    I have been recently working on a task where I need to take in two input
    (types) files , compare them and produce a result from it using a logic.
    But as I understand simple MapReduce implementations are for processing a
    single input type. The closest implementation I could think of similar to
    my
    work is Join MapReduce. But I am not able to understand much from the
    example provided in Hadoop .. Can someone provide a good pointer to such
    multiple input data processing ( or Join ) in mapreduce . It will also be
    great if you can send in some sample code for the same.

    Thanks ,

    Matthew
  • Harsh J at Jan 1, 2011 at 5:34 pm
    It is map.input.file [.start and .length also relate to the InputSplit
    for the mapper]
    For more: http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse

    With a custom RR, you can put in this value yourself
    (FileSplit.getPath()) before control heads to the Mapper/MapRunner.
    On Fri, Dec 31, 2010 at 6:17 PM, Zhou, Yunqing wrote:
    You can use "map.input.split"(something like that, I can't remember..) param
    in Configuration.
    this param contains the input file path, you can use it to branch your logic
    this param can be found in TextInputFormat.java
    --
    Harsh J
    www.harshj.com
  • Michael Toback at Jan 1, 2011 at 7:55 pm
    I am faced with a similar problem.

    I want to process an entire set of bugs including their entire history.
    Once. Then, incrementally process a combination of the latest output + the
    changes since last processed.

    I hit upon a way of handling multiple outputs. Perhaps if there was
    something in the data format that told you where the data came from, the
    same mapper could process both and the reducer could merge them?

    Michael Toback
    SMTS, VMWare
    --
    View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Input-Data-Processing-using-MapReduce-tp1701199p2165027.html
    Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 14, '10 at 2:03p
activeJan 1, '11 at 7:55p
posts5
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase