FAQ
Hi all,

I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
have been able to successfully work with my own mappers and reducers, but
now I need to generate output (from reducer) in a format different from the
default TextOutputFormat. I have a few questions:

(1) Similar to Hadoop streaming, is there an option to set OutputFormat in
HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
I am using Hadoop version 0.20.2.

(2) For a simple test on how to use an in-built non-default writer, I tried
the following:

hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=false -input input.seq -output output
-inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
org.apache.hadoop.io.SequenceFile.Writer -program my_test_program

However this fails with a ClassNotFound exception. And if I remove the
-writer flag and use the default writer, it works just fine.

(3) Is there some example or discussion related to how to write your own
RecordWriter and run it with Hadoop-pipes ?

Thanks.

Best,
Vivek
--

Search Discussions

  • Vivek K at Sep 20, 2011 at 9:57 pm
    It would very helpful if someone can point to where I can possibly find a
    solution to this problem.

    Thanks.
    Vivek
    --
    On Tue, Sep 13, 2011 at 12:27 PM, Vivek K wrote:

    Hi all,

    I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
    have been able to successfully work with my own mappers and reducers, but
    now I need to generate output (from reducer) in a format different from the
    default TextOutputFormat. I have a few questions:

    (1) Similar to Hadoop streaming, is there an option to set OutputFormat in
    HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
    I am using Hadoop version 0.20.2.

    (2) For a simple test on how to use an in-built non-default writer, I tried
    the following:

    hadoop pipes -D hadoop.pipes.java.recordreader=true -D
    hadoop.pipes.java.recordwriter=false -input input.seq -output output
    -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
    org.apache.hadoop.io.SequenceFile.Writer -program my_test_program

    However this fails with a ClassNotFound exception. And if I remove the
    -writer flag and use the default writer, it works just fine.

    (3) Is there some example or discussion related to how to write your own
    RecordWriter and run it with Hadoop-pipes ?

    Thanks.

    Best,
    Vivek
    --
  • Brock Noland at Sep 20, 2011 at 10:25 pm
    Hi,
    On Tue, Sep 13, 2011 at 12:27 PM, Vivek K wrote:
    Hi all,

    I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
    have been able to successfully work with my own mappers and reducers, but
    now I need to generate output (from reducer) in a format different from the
    default TextOutputFormat. I have a few questions:

    (1) Similar to Hadoop streaming, is there an option to set OutputFormat in
    HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
    I am using Hadoop version 0.20.2.

    (2) For a simple test on how to use an in-built non-default writer, I tried
    the following:

    hadoop pipes -D hadoop.pipes.java.recordreader=true -D
    hadoop.pipes.java.recordwriter=false -input input.seq -output output
    -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
    org.apache.hadoop.io.SequenceFile.Writer -program my_test_program

    -writer wants an outputformat:

    if (results.hasOption("writer")) {
    setIsJavaRecordWriter(job, true);
    job.setOutputFormat(getClass(results, "writer", job,
    OutputFormat.class));



    As such I think you want:

    -writer org.apache.hadoop.mapred.SequenceFileOutputFormat

    SequenceFile.Writer simply writes sequence files has nothing todo with
    MapReduce.

    This is also wrong:

    hadoop.pipes.java.recordwriter=false

    Brock
  • Vivek K at Sep 20, 2011 at 11:05 pm
    Hi Brock

    Thanks for a prompt and to-the-point response.
    It is working as you said.

    Best,
    Vivek
    --
    On Tue, Sep 20, 2011 at 6:25 PM, Brock Noland wrote:

    Hi,
    On Tue, Sep 13, 2011 at 12:27 PM, Vivek K wrote:
    Hi all,

    I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
    have been able to successfully work with my own mappers and reducers, but
    now I need to generate output (from reducer) in a format different from the
    default TextOutputFormat. I have a few questions:

    (1) Similar to Hadoop streaming, is there an option to set OutputFormat in
    HadoopPipes (in order to use say
    org.apache.hadoop.io.SequenceFile.Writer) ?
    I am using Hadoop version 0.20.2.

    (2) For a simple test on how to use an in-built non-default writer, I tried
    the following:

    hadoop pipes -D hadoop.pipes.java.recordreader=true -D
    hadoop.pipes.java.recordwriter=false -input input.seq -output output
    -inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
    org.apache.hadoop.io.SequenceFile.Writer -program my_test_program

    -writer wants an outputformat:

    if (results.hasOption("writer")) {
    setIsJavaRecordWriter(job, true);
    job.setOutputFormat(getClass(results, "writer", job,
    OutputFormat.class));



    As such I think you want:

    -writer org.apache.hadoop.mapred.SequenceFileOutputFormat

    SequenceFile.Writer simply writes sequence files has nothing todo with
    MapReduce.

    This is also wrong:

    hadoop.pipes.java.recordwriter=false

    Brock

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 13, '11 at 4:28p
activeSep 20, '11 at 11:05p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Vivek K: 3 posts Brock Noland: 1 post

People

Translate

site design / logo © 2022 Grokbase