FAQ
Hi,
I want to name output files of my map-red job (sequence files) to be a
certain name instead of part* default format.

Has anyone ever tried to over-ride the default filename and give output file
name per map-red ?

Thanks,
-JJ

Search Discussions

  • Christoph Schmitz at Jun 20, 2011 at 6:17 am
    Hi JJ,

    you can do that by subclassing TextOutputFormat (or whichever output format you're using) and overloading the getDefaultWorkFile method:

    public class MyOutputFormat<K, V> extends TextOutputFormat<K, V> {
    // ...
    public Path getDefaultWorkFile(TaskAttemptContext context,
    String extension) throws IOException {
    FileOutputCommitter committer = (FileOutputCommitter) getOutputCommitter(context);
    return new Path(committer.getWorkPath(), myOwnMethodToComputeTheFileName(context));
    }
    }

    Regards,

    Christoph

    -----Ursprüngliche Nachricht-----
    Von: Mapred Learn
    Gesendet: Montag, 20. Juni 2011 06:59
    An: mapreduce-user@hadoop.apache.org; cdh-user@cloudera.org
    Betreff: how to change default name of a sequnce file

    Hi,
    I want to name output files of my map-red job (sequence files) to be a certain name instead of part* default format.

    Has anyone ever tried to over-ride the default filename and give output file name per map-red ?

    Thanks,
    -JJ
  • Mapred Learn at Jun 20, 2011 at 6:20 am
    Thanks !
    I will try this !
    On Sun, Jun 19, 2011 at 11:16 PM, Christoph Schmitz wrote:

    Hi JJ,

    you can do that by subclassing TextOutputFormat (or whichever output format
    you're using) and overloading the getDefaultWorkFile method:

    public class MyOutputFormat<K, V> extends TextOutputFormat<K, V> {
    // ...
    public Path getDefaultWorkFile(TaskAttemptContext context,
    String extension) throws IOException {
    FileOutputCommitter committer = (FileOutputCommitter)
    getOutputCommitter(context);
    return new Path(committer.getWorkPath(),
    myOwnMethodToComputeTheFileName(context));
    }
    }

    Regards,

    Christoph

    -----Ursprüngliche Nachricht-----
    Von: Mapred Learn
    Gesendet: Montag, 20. Juni 2011 06:59
    An: mapreduce-user@hadoop.apache.org; cdh-user@cloudera.org
    Betreff: how to change default name of a sequnce file

    Hi,
    I want to name output files of my map-red job (sequence files) to be a
    certain name instead of part* default format.

    Has anyone ever tried to over-ride the default filename and give output
    file name per map-red ?

    Thanks,
    -JJ
  • Mapred Learn at Jun 20, 2011 at 6:50 am
    Another question here is in getDefaultWorkFile() is that, how is it possible
    to find out the mapper number that is used in output. For eg, if you have 3
    mappers, how can I add to output file( <OutputFile>) of 30th mapper -
    <OutputFile>_30 ?
    On Sun, Jun 19, 2011 at 11:19 PM, Mapred Learn wrote:

    Thanks !
    I will try this !

    On Sun, Jun 19, 2011 at 11:16 PM, Christoph Schmitz <
    Christoph.Schmitz@1und1.de> wrote:
    Hi JJ,

    you can do that by subclassing TextOutputFormat (or whichever output
    format you're using) and overloading the getDefaultWorkFile method:

    public class MyOutputFormat<K, V> extends TextOutputFormat<K, V> {
    // ...
    public Path getDefaultWorkFile(TaskAttemptContext context,
    String extension) throws IOException {
    FileOutputCommitter committer = (FileOutputCommitter)
    getOutputCommitter(context);
    return new Path(committer.getWorkPath(),
    myOwnMethodToComputeTheFileName(context));
    }
    }

    Regards,

    Christoph

    -----Ursprüngliche Nachricht-----
    Von: Mapred Learn
    Gesendet: Montag, 20. Juni 2011 06:59
    An: mapreduce-user@hadoop.apache.org; cdh-user@cloudera.org
    Betreff: how to change default name of a sequnce file

    Hi,
    I want to name output files of my map-red job (sequence files) to be a
    certain name instead of part* default format.

    Has anyone ever tried to over-ride the default filename and give output
    file name per map-red ?

    Thanks,
    -JJ
  • Christoph Schmitz at Jun 20, 2011 at 8:01 am
    int partition = context.getTaskAttemptID().getTaskID().getId();

    That will get you the number that is usually appended to the part-r-<number> name.

    Regards,
    Christoph

    -----Ursprüngliche Nachricht-----
    Von: Mapred Learn
    Gesendet: Montag, 20. Juni 2011 08:50
    An: mapreduce-user@hadoop.apache.org
    Betreff: Re: how to change default name of a sequnce file

    Another question here is in getDefaultWorkFile() is that, how is it possible to find out the mapper number that is used in output. For eg, if you have 30 mappers, how can I add to output file( <OutputFile>) of 30th mapper - <OutputFile>_30 ?


    On Sun, Jun 19, 2011 at 11:19 PM, Mapred Learn wrote:


    Thanks !
    I will try this !


    On Sun, Jun 19, 2011 at 11:16 PM, Christoph Schmitz wrote:


    Hi JJ,

    you can do that by subclassing TextOutputFormat (or whichever output format you're using) and overloading the getDefaultWorkFile method:

    public class MyOutputFormat<K, V> extends TextOutputFormat<K, V> {
    // ...
    public Path getDefaultWorkFile(TaskAttemptContext context,
    String extension) throws IOException {
    FileOutputCommitter committer = (FileOutputCommitter) getOutputCommitter(context);
    return new Path(committer.getWorkPath(), myOwnMethodToComputeTheFileName(context));
    }
    }

    Regards,

    Christoph

    -----Ursprüngliche Nachricht-----
    Von: Mapred Learn
    Gesendet: Montag, 20. Juni 2011 06:59
    An: mapreduce-user@hadoop.apache.org; cdh-user@cloudera.org
    Betreff: how to change default name of a sequnce file


    Hi,
    I want to name output files of my map-red job (sequence files) to be a certain name instead of part* default format.

    Has anyone ever tried to over-ride the default filename and give output file name per map-red ?

    Thanks,
    -JJ

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJun 20, '11 at 4:59a
activeJun 20, '11 at 8:01a
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Mapred Learn: 3 posts Christoph Schmitz: 2 posts

People

Translate

site design / logo © 2022 Grokbase