FAQ
I've defined a custom key class that implements writable. I've noticed
that for use between the mapper and reducer the write and readFields are
actually used. However, when I use an identity reducer, toString is
called when I do something like output.collect(myClass, null)

Is there a way to output the write() instead?

Thank you.

Search Discussions

  • Owen O'Malley at Dec 16, 2008 at 4:44 pm

    On Dec 16, 2008, at 8:28 AM, David Coe wrote:

    Is there a way to output the write() instead?

    Use SequenceFileOutputFormat. It writes binary files using the write.
    The reverse is SequenceFileInputFormat, which reads the sequence files
    using readFields.

    -- Owen
  • David Coe at Dec 16, 2008 at 4:59 pm

    Owen O'Malley wrote:
    On Dec 16, 2008, at 8:28 AM, David Coe wrote:

    Is there a way to output the write() instead?

    Use SequenceFileOutputFormat. It writes binary files using the write.
    The reverse is SequenceFileInputFormat, which reads the sequence files
    using readFields.

    -- Owen
    Thank you for your swift response. I am getting this error when I try
    your suggestion:

    java.lang.NullPointerException
    at
    org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:987)
    at
    org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:70)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.collect(MapTask.java:385)


    *My configuration:*
    conf.setMapperClass(MyMap.class);
    conf.setReducerClass(IdentityReducer.class);

    conf.setOutputFormat(SequenceFileOutputFormat.class);
    conf.setOutputKeyClass(MyClass.class);

    *My mapper:*
    public static class MyMap extends MapReduceBase implements
    Mapper<LongWritable, Text, MyClass,NullWritable> {

    public void map(LongWritable key, Text value,
    OutputCollector<MyClass,NullWritable> output,
    Reporter reporter) throws IOException {
    *
    My Class:*
    public class MyClass implements Writable,
    Comparable<MyClass> {

    Which setting am I missing that results in the null pointer?

    Thank you!!
  • Owen O'Malley at Dec 16, 2008 at 5:23 pm

    On Dec 16, 2008, at 8:58 AM, David Coe wrote:
    Thank you for your swift response. I am getting this error when I try
    your suggestion:

    java.lang.NullPointerException
    at
    org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:987)
    It means you are trying to write a null value. Your reduce is doing
    something like:

    output.collect(key, null);

    In TextOutputFormat, that is ok and just skips it.
    SequenceFileOutputFormat doesn't like nulls.

    -- Owen
  • David Coe at Dec 16, 2008 at 5:31 pm

    Owen O'Malley wrote:
    On Dec 16, 2008, at 9:14 AM, David Coe wrote:

    Does the SequenceFileOutputFormat work with NullWritable as the value?
    Yes.
    Owen O'Malley wrote:
    It means you are trying to write a null value. Your reduce is doing
    something like:

    output.collect(key, null);

    In TextOutputFormat, that is ok and just skips it.
    SequenceFileOutputFormat doesn't like nulls.

    -- Owen
    Since the SequenceFileOutputFormat doesn't like nulls, how would I use
    NullWritable? Obviously output.collect(key, null) isn't working. If I
    change it to output.collect(key, new IntWritable()) I get the result I
    want (plus an int that I don't), but output.collect(key, new
    NullWritable()) does not work.

    Thanks again.

    David
  • Aaron Kimball at Dec 18, 2008 at 2:45 am
    NullWritable has a get() method that returns the singleton instance of the
    NullWritable.
    - Aaron
    On Tue, Dec 16, 2008 at 9:30 AM, David Coe wrote:

    Owen O'Malley wrote:
    On Dec 16, 2008, at 9:14 AM, David Coe wrote:

    Does the SequenceFileOutputFormat work with NullWritable as the value?
    Yes.
    Owen O'Malley wrote:
    It means you are trying to write a null value. Your reduce is doing
    something like:

    output.collect(key, null);

    In TextOutputFormat, that is ok and just skips it.
    SequenceFileOutputFormat doesn't like nulls.

    -- Owen
    Since the SequenceFileOutputFormat doesn't like nulls, how would I use
    NullWritable? Obviously output.collect(key, null) isn't working. If I
    change it to output.collect(key, new IntWritable()) I get the result I
    want (plus an int that I don't), but output.collect(key, new
    NullWritable()) does not work.

    Thanks again.

    David
  • Owen O'Malley at Dec 18, 2008 at 7:07 am

    On Dec 16, 2008, at 9:30 AM, David Coe wrote:

    Since the SequenceFileOutputFormat doesn't like nulls, how would I use
    NullWritable? Obviously output.collect(key, null) isn't working.
    If I
    change it to output.collect(key, new IntWritable()) I get the result I
    want (plus an int that I don't), but output.collect(key, new
    NullWritable()) does not work.
    Sorry, I answered you literally. You can write a SequenceFile with
    NullWritables as the values, but you really want optional nulls. I'd
    probably define a Wrapper class like GenericWritable. It would look
    something like:

    class NullableWriable<T extends Writable> implements Writable {
    private T instance;
    private boolean isNull;
    public void setNull(boolean isNull) {
    this.isNull = isNull;
    }
    public void readFields(DataInput in) throws IOException {
    read isNull;
    if (!isNull) {
    instance.readFields(in);
    }
    public void write(DataOutput out) throws IOException {
    write isNull;
    if (!isNull) {
    instance.write(out);
    }
    }
    }

    -- Owen
  • David Coe at Dec 16, 2008 at 5:15 pm

    Owen O'Malley wrote:
    On Dec 16, 2008, at 8:28 AM, David Coe wrote:

    Is there a way to output the write() instead?

    Use SequenceFileOutputFormat. It writes binary files using the write.
    The reverse is SequenceFileInputFormat, which reads the sequence files
    using readFields.

    -- Owen
    Does the SequenceFileOutputFormat work with NullWritable as the value?
  • Owen O'Malley at Dec 16, 2008 at 5:24 pm

    On Dec 16, 2008, at 9:14 AM, David Coe wrote:

    Does the SequenceFileOutputFormat work with NullWritable as the value?
    Yes.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 16, '08 at 4:29p
activeDec 18, '08 at 7:07a
posts9
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase