I am trying to build a Hadoop/MR application in c++ using hadoop-pipes. I
have been able to successfully work with my own mappers and reducers, but
now I need to generate output (from reducer) in a format different from the
default TextOutputFormat. I have a few questions:
(1) Similar to Hadoop streaming, is there an option to set OutputFormat in
HadoopPipes (in order to use say org.apache.hadoop.io.SequenceFile.Writer) ?
I am using Hadoop version 0.20.2.
(2) For a simple test on how to use an in-built non-default writer, I tried
hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=false -input input.seq -output output
-inputformat org.apache.hadoop.mapred.SequenceFileInputFormat -writer
org.apache.hadoop.io.SequenceFile.Writer -program my_test_program
However this fails with a ClassNotFound exception. And if I remove the
-writer flag and use the default writer, it works just fine.
(3) Is there some example or discussion related to how to write your own
RecordWriter and run it with Hadoop-pipes ?