FAQ
Hi all,

I am attempting to implement MultipleOutputFormat to write data to multiple
files dependent on the output keys and values. Can somebody provide a
working example with how to implement this in Hadoop 0.20.2?

Thanks!

--
Roger Chen
UC Davis Genome Center

Search Discussions

  • Ayon Sinha at Jul 26, 2011 at 4:24 pm
    package com.shopkick.util;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat;


    public class MultiFileOutput extends MultipleTextOutputFormat<Text, Text> {

    @Override
    protected String generateFileNameForKeyValue(Text key, Text value,
    String name) {
    // TODO Auto-generated method stub
    return key.toString()+"/"+name;
    }

    }



    -Ayon
    See My Photos on Flickr
    Also check out my Blog for answers to commonly asked questions.



    ________________________________
    From: Roger Chen <rogchen@ucdavis.edu>
    To: common-user@hadoop.apache.org
    Sent: Tuesday, July 26, 2011 9:11 AM
    Subject: Multiple Output Formats

    Hi all,

    I am attempting to implement MultipleOutputFormat to write data to multiple
    files dependent on the output keys and values. Can somebody provide a
    working example with how to implement this in Hadoop 0.20.2?

    Thanks!

    --
    Roger Chen
    UC Davis Genome Center
  • Harsh J at Jul 26, 2011 at 7:00 pm
    Roger,

    Beyond Ayon's example answer, I'd like you to note that the newer API
    will *not* carry a supported MultipleOutputFormat as it has been
    obsoleted away in favor of MultipleOutputs, whose use is much easier,
    is threadsafe, and also carries an example to look at, at [1].

    [1] - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
    On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen wrote:
    Hi all,

    I am attempting to implement MultipleOutputFormat to write data to multiple
    files dependent on the output keys and values. Can somebody provide a
    working example with how to implement this in Hadoop 0.20.2?

    Thanks!

    --
    Roger Chen
    UC Davis Genome Center


    --
    Harsh J
  • Roger Chen at Jul 26, 2011 at 8:31 pm
    The problem I'm facing right now is with the configuration needed for
    MultipleOutputs, because JobConf is deprecated now and I am unable to do its
    equivalent with Configuration. I set the configuration of the job by:

    Job job = new Job(getConf());

    but when I'm trying to use this line in my config:

    MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
    LongWritable.class, Text.class);

    I get an issue about no suitable method being found.

    Roger
    On Tue, Jul 26, 2011 at 12:00 PM, Harsh J wrote:

    Roger,

    Beyond Ayon's example answer, I'd like you to note that the newer API
    will *not* carry a supported MultipleOutputFormat as it has been
    obsoleted away in favor of MultipleOutputs, whose use is much easier,
    is threadsafe, and also carries an example to look at, at [1].

    [1] -
    http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
    On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen wrote:
    Hi all,

    I am attempting to implement MultipleOutputFormat to write data to multiple
    files dependent on the output keys and values. Can somebody provide a
    working example with how to implement this in Hadoop 0.20.2?

    Thanks!

    --
    Roger Chen
    UC Davis Genome Center


    --
    Harsh J


    --
    Roger Chen
    UC Davis Genome Center
  • Harsh J at Jul 26, 2011 at 8:41 pm
    Gotcha, my bad then. The hadoop distribution I use provides a
    backported MO, so I overlooked this particular issue while replying.

    Still, the warning holds as the versions would roll ahead. But I
    believe the refactor would not be that much of a pain, so perhaps its
    a no-worry.
    On Wed, Jul 27, 2011 at 2:00 AM, Roger Chen wrote:
    The problem I'm facing right now is with the configuration needed for
    MultipleOutputs, because JobConf is deprecated now and I am unable to do its
    equivalent with Configuration. I set the configuration of the job by:

    Job job = new Job(getConf());

    but when I'm trying to use this line in my config:

    MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
    LongWritable.class, Text.class);

    I get an issue about no suitable method being found.

    Roger
    On Tue, Jul 26, 2011 at 12:00 PM, Harsh J wrote:

    Roger,

    Beyond Ayon's example answer, I'd like you to note that the newer API
    will *not* carry a supported MultipleOutputFormat as it has been
    obsoleted away in favor of MultipleOutputs, whose use is much easier,
    is threadsafe, and also carries an example to look at, at [1].

    [1] -
    http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
    On Tue, Jul 26, 2011 at 9:41 PM, Roger Chen wrote:
    Hi all,

    I am attempting to implement MultipleOutputFormat to write data to multiple
    files dependent on the output keys and values. Can somebody provide a
    working example with how to implement this in Hadoop 0.20.2?

    Thanks!

    --
    Roger Chen
    UC Davis Genome Center


    --
    Harsh J


    --
    Roger Chen
    UC Davis Genome Center


    --
    Harsh J
  • Luca Pireddu at Jul 27, 2011 at 6:31 am

    On July 26, 2011 06:11:33 PM Roger Chen wrote:
    Hi all,

    I am attempting to implement MultipleOutputFormat to write data to multiple
    files dependent on the output keys and values. Can somebody provide a
    working example with how to implement this in Hadoop 0.20.2?

    Thanks!
    Hello,

    I have a working sample here:

    http://biodoop-seal.bzr.sourceforge.net/bzr/biodoop-
    seal/trunk/annotate/head%3A/src/it/crs4/seal/demux/DemuxOutputFormat.java

    It extends FileOutputFormat.

    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Alejandro Abdelnur at Jul 27, 2011 at 1:52 pm
    Roger,

    Or you can take a look at Hadoop's MultipleOutputs class.

    Thanks.

    Alejandro
    On Tue, Jul 26, 2011 at 11:30 PM, Luca Pireddu wrote:
    On July 26, 2011 06:11:33 PM Roger Chen wrote:
    Hi all,

    I am attempting to implement MultipleOutputFormat to write data to multiple
    files dependent on the output keys and values. Can somebody provide a
    working example with how to implement this in Hadoop 0.20.2?

    Thanks!
    Hello,

    I have a working sample here:

    http://biodoop-seal.bzr.sourceforge.net/bzr/biodoop-
    seal/trunk/annotate/head%3A/src/it/crs4/seal/demux/DemuxOutputFormat.java

    It extends FileOutputFormat.

    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 26, '11 at 4:12p
activeJul 27, '11 at 1:52p
posts7
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase