FAQ
Hi,

My first email on the list, and overall pretty new to Hadoop, so I'm hoping to find some help with a new task I have to do for work.
I need to do a join between 2 sets of files. One is a bunch of csv files and the other set is sequence files.

I was told MultiFilterRecorderReader could help me do the join, but I haven't been successful to find some good example on where and how to use that class to do the join.
I have found a good example using CompositeInputFormat here: http://www.congiu.com/node/5
But it assumes that the input is sorted and I can't guarantee that it will be on the csv files at least.

Anyone knows what I need to do with that MultiFilterRecorderReader? Inherit it on the mapper? I'm a little confused... Please let me know if you have any pointers on that one.

Thanks.

Search Discussions

  • Lance Norskog at Aug 19, 2010 at 5:19 am
    Hadoop has a toolkit called 'map-side joins' which requires sorted
    input tables. org.apache.hadoop.examples.Join.java shows how. Good
    luck decoding it!

    Could you use chained mapper tasks to sort each input set before using
    the join framework?
    On Wed, Aug 18, 2010 at 10:10 AM, y l wrote:
    Hi,

    My first email on the list, and overall pretty new to Hadoop, so I'm hoping to find some help with a new task I have to do for work.
    I need to do a join between 2 sets of files. One is a bunch of csv files and the other set is sequence files.

    I was told MultiFilterRecorderReader could help me do the join, but I haven't been successful to find some good example on where and how to use that class to do the join.
    I have found a good example using CompositeInputFormat here: http://www.congiu.com/node/5
    But it assumes that the input is sorted and I can't guarantee that it will be on the csv files at least.

    Anyone knows what I need to do with that MultiFilterRecorderReader? Inherit it on the mapper? I'm a little confused... Please let me know if you have any pointers on that one.

    Thanks.


    --
    Lance Norskog
    [email protected]
  • PeterAtReunion at Aug 19, 2010 at 6:42 pm
    Lance -

    Fun to see you on a mailing list.
    How are things?

    ;;peter

    On 08/18/10 22:11, Lance Norskog wrote:
    Hadoop has a toolkit called 'map-side joins' which requires sorted
    input tables. org.apache.hadoop.examples.Join.java shows how. Good
    luck decoding it!

    Could you use chained mapper tasks to sort each input set before using
    the join framework?
    On Wed, Aug 18, 2010 at 10:10 AM, y l wrote:
    Hi,

    My first email on the list, and overall pretty new to Hadoop, so I'm hoping to find some help with a new task I have to do for work.
    I need to do a join between 2 sets of files. One is a bunch of csv files and the other set is sequence files.

    I was told MultiFilterRecorderReader could help me do the join, but I haven't been successful to find some good example on where and how to use that class to do the join.
    I have found a good example using CompositeInputFormat here: http://www.congiu.com/node/5
    But it assumes that the input is sorted and I can't guarantee that it will be on the csv files at least.

    Anyone knows what I need to do with that MultiFilterRecorderReader? Inherit it on the mapper? I'm a little confused... Please let me know if you have any pointers on that one.

    Thanks.
  • Y l at Aug 19, 2010 at 5:36 pm
    Yeah I'm taking the path of chained mappers at this point. If anything it will give me a sorted output in the end. It's just to bad to have to run 3 jobs when I'm sure Hadoop provides an elegant way to do it in one.

    ----- Original Message -----
    From: Lance Norskog
    Sent: 08/18/10 10:11 PM
    To: [email protected]
    Subject: Re: MultiFilterRecordReader

    Hadoop has a toolkit called 'map-side joins' which requires sorted input tables. org.apache.hadoop.examples.Join.java shows how. Good luck decoding it! Could you use chained mapper tasks to sort each input set before using the join framework? On Wed, Aug 18, 2010 at 10:10 AM, y l wrote: > Hi, > > My first email on the list, and overall pretty new to Hadoop, so I'm hoping to find some help with a new task I have to do for work. > I need to do a join between 2 sets of files. One is a bunch of csv files and the other set is sequence files. > > I was told MultiFilterRecorderReader could help me do the join, but I haven't been successful to find some good example on where and how to use that class to do the join. > I have found a good example using CompositeInputFormat here: http://www.congiu.com/node/5 > But it assumes that the input is sorted and I can't guarantee that it will be on the csv files at least. > > Anyone knows what I need to do with that MultiFi
    lterRecorderReader? Inherit it on the mapper? I'm a little confused... Please let me know if you have any pointers on that one. > > Thanks. > -- Lance Norskog [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 18, '10 at 5:14p
activeAug 19, '10 at 6:42p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase