FAQ
Hello all

Is there a best practice for using my own classes as keys and values?

My first attempt at doing this was successful - I built a
BigIntegerWritable class using IntWritable as a template. It was easy
because BigInteger has methods converting to and from byte arrays,
which I could then write into the DataOutput or read from the
DataInput.

It seems like I should be able to use object serialization to write
to/read from the DataOutput/Input objects and make my own classes
implement the Writable interface. It seems like I should be able to
do something like this:

import java.io.*;

import org.apache.hadoop.io.*;

public class Sample implements Writable {

Address address;
SampleValue value; // sampled value at this point

public Sample(Address a, SampleValue v) {
address = a;
value = v;
}

public SampleValue getValue() { return value;}
public Address getAddress() { return address; }

public String toString () {
return (address.toString() + " " + value.toString());
}

[...]

public void readFields(DataInput in) throws IOException {
ObjectInputStream oin = new ObjectInputStream((DataInputBuffer)in);

try {
address = (Address)oin.readObject();
value = (SampleValue)oin.readObject();
} catch (ClassNotFoundException e) {
throw new IOException(e.toString());
}

}

public void write(DataOutput out) throws IOException {
ObjectOutputStream oout = new ObjectOutputStream((DataOutputBuffer)out);

oout.writeObject(address);
oout.writeObject(value);
}
}

This code compiles, but throws exceptions at runtime, complaining that
WritableComparator can not access a member of class Sample with
modifiers "". Can someone tell me what this exception is talking
about?

Do I need to implement a WritableComparator for each class that I want
to implement Writable?

Thanks again for the help.

-steve

Search Discussions

  • Matt Kent at Oct 10, 2007 at 4:46 pm
    I believe in this case you'll want to make Sample and Address writable as well.
    On 10/10/07, Steve Schlosser wrote:
    Hello all

    Is there a best practice for using my own classes as keys and values?

    My first attempt at doing this was successful - I built a
    BigIntegerWritable class using IntWritable as a template. It was easy
    because BigInteger has methods converting to and from byte arrays,
    which I could then write into the DataOutput or read from the
    DataInput.

    It seems like I should be able to use object serialization to write
    to/read from the DataOutput/Input objects and make my own classes
    implement the Writable interface. It seems like I should be able to
    do something like this:

    import java.io.*;

    import org.apache.hadoop.io.*;

    public class Sample implements Writable {

    Address address;
    SampleValue value; // sampled value at this point

    public Sample(Address a, SampleValue v) {
    address = a;
    value = v;
    }

    public SampleValue getValue() { return value;}
    public Address getAddress() { return address; }

    public String toString () {
    return (address.toString() + " " + value.toString());
    }

    [...]

    public void readFields(DataInput in) throws IOException {
    ObjectInputStream oin = new ObjectInputStream((DataInputBuffer)in);

    try {
    address = (Address)oin.readObject();
    value = (SampleValue)oin.readObject();
    } catch (ClassNotFoundException e) {
    throw new IOException(e.toString());
    }

    }

    public void write(DataOutput out) throws IOException {
    ObjectOutputStream oout = new ObjectOutputStream((DataOutputBuffer)out);

    oout.writeObject(address);
    oout.writeObject(value);
    }
    }

    This code compiles, but throws exceptions at runtime, complaining that
    WritableComparator can not access a member of class Sample with
    modifiers "". Can someone tell me what this exception is talking
    about?

    Do I need to implement a WritableComparator for each class that I want
    to implement Writable?

    Thanks again for the help.

    -steve
  • Steve Schlosser at Oct 10, 2007 at 6:57 pm
    Is this true? The fact that SampleValue and Address implement
    Serializable should be sufficient to write them out to the stream.
    They are not ever written out as keys or values themselves.

    -steve
    On 10/10/07, Matt Kent wrote:
    I believe in this case you'll want to make Sample and Address writable as well.
    On 10/10/07, Steve Schlosser wrote:
    Hello all

    Is there a best practice for using my own classes as keys and values?

    My first attempt at doing this was successful - I built a
    BigIntegerWritable class using IntWritable as a template. It was easy
    because BigInteger has methods converting to and from byte arrays,
    which I could then write into the DataOutput or read from the
    DataInput.

    It seems like I should be able to use object serialization to write
    to/read from the DataOutput/Input objects and make my own classes
    implement the Writable interface. It seems like I should be able to
    do something like this:

    import java.io.*;

    import org.apache.hadoop.io.*;

    public class Sample implements Writable {

    Address address;
    SampleValue value; // sampled value at this point

    public Sample(Address a, SampleValue v) {
    address = a;
    value = v;
    }

    public SampleValue getValue() { return value;}
    public Address getAddress() { return address; }

    public String toString () {
    return (address.toString() + " " + value.toString());
    }

    [...]

    public void readFields(DataInput in) throws IOException {
    ObjectInputStream oin = new ObjectInputStream((DataInputBuffer)in);

    try {
    address = (Address)oin.readObject();
    value = (SampleValue)oin.readObject();
    } catch (ClassNotFoundException e) {
    throw new IOException(e.toString());
    }

    }

    public void write(DataOutput out) throws IOException {
    ObjectOutputStream oout = new ObjectOutputStream((DataOutputBuffer)out);

    oout.writeObject(address);
    oout.writeObject(value);
    }
    }

    This code compiles, but throws exceptions at runtime, complaining that
    WritableComparator can not access a member of class Sample with
    modifiers "". Can someone tell me what this exception is talking
    about?

    Do I need to implement a WritableComparator for each class that I want
    to implement Writable?

    Thanks again for the help.

    -steve
  • Matt Kent at Oct 10, 2007 at 9:06 pm
    You're right, Serializable should be sufficient. I was thinking of a
    case where you'd sometimes want to write them out as values, but other
    times combine them inside Sample.
    On 10/10/07, Steve Schlosser wrote:
    Is this true? The fact that SampleValue and Address implement
    Serializable should be sufficient to write them out to the stream.
    They are not ever written out as keys or values themselves.

    -steve
    On 10/10/07, Matt Kent wrote:
    I believe in this case you'll want to make Sample and Address writable as well.
    On 10/10/07, Steve Schlosser wrote:
    Hello all

    Is there a best practice for using my own classes as keys and values?

    My first attempt at doing this was successful - I built a
    BigIntegerWritable class using IntWritable as a template. It was easy
    because BigInteger has methods converting to and from byte arrays,
    which I could then write into the DataOutput or read from the
    DataInput.

    It seems like I should be able to use object serialization to write
    to/read from the DataOutput/Input objects and make my own classes
    implement the Writable interface. It seems like I should be able to
    do something like this:

    import java.io.*;

    import org.apache.hadoop.io.*;

    public class Sample implements Writable {

    Address address;
    SampleValue value; // sampled value at this point

    public Sample(Address a, SampleValue v) {
    address = a;
    value = v;
    }

    public SampleValue getValue() { return value;}
    public Address getAddress() { return address; }

    public String toString () {
    return (address.toString() + " " + value.toString());
    }

    [...]

    public void readFields(DataInput in) throws IOException {
    ObjectInputStream oin = new ObjectInputStream((DataInputBuffer)in);

    try {
    address = (Address)oin.readObject();
    value = (SampleValue)oin.readObject();
    } catch (ClassNotFoundException e) {
    throw new IOException(e.toString());
    }

    }

    public void write(DataOutput out) throws IOException {
    ObjectOutputStream oout = new ObjectOutputStream((DataOutputBuffer)out);

    oout.writeObject(address);
    oout.writeObject(value);
    }
    }

    This code compiles, but throws exceptions at runtime, complaining that
    WritableComparator can not access a member of class Sample with
    modifiers "". Can someone tell me what this exception is talking
    about?

    Do I need to implement a WritableComparator for each class that I want
    to implement Writable?

    Thanks again for the help.

    -steve
  • Steve Schlosser at Oct 10, 2007 at 10:18 pm
    For the time being, I've given up on using object serialization to do
    what I want. Instead, I'm going to just marshal and unmarshal the
    values of my class myself. I've implemented write() and readField()
    methods in the classes that I want to read and write. (See my
    definition of Sample below.)

    Unfortunately, Hadoop throws the following exception when my program starts:

    Job started: Wed Oct 10 18:04:06 EDT 2007
    07/10/10 18:04:06 INFO mapred.InputFormatBase: Total input paths to process : 1
    07/10/10 18:04:06 INFO mapred.JobClient: Running job: job_nlx1k6
    07/10/10 18:04:06 WARN mapred.LocalJobRunner: job_nlx1k6
    java.lang.ExceptionInInitializerError
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:315)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:326)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:339)
    at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:411)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:115)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126)
    Caused by: java.lang.RuntimeException:
    java.lang.InstantiationException: net.intelresearch.cvmHadoop.Sample
    at org.apache.hadoop.io.WritableComparator.newKey(WritableComparator.java:74)
    at org.apache.hadoop.io.WritableComparator.(Unknown Source)
    at net.intelresearch.cvmHadoop.Sample.<clinit>(Unknown Source)
    ... 9 more
    Caused by: java.lang.InstantiationException: net.intelresearch.cvmHadoop.Sample
    at java.lang.Class.newInstance0(Class.java:340)
    at java.lang.Class.newInstance(Class.java:308)
    at org.apache.hadoop.io.WritableComparator.newKey(WritableComparator.java:72)
    ... 12 more
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
    at net.intelresearch.cvmHadoop.KeyedByLocationalCode$Driver.main(Unknown
    Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143)
    at net.intelresearch.cvmHadoop.Usage.main(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

    If I'm only trying to use the Writable interface (not
    WritableComparable), what is the purpose of a WritableComparator?
    Values are not sorted, only Keys, so it seems that there is no need to
    define a comparator for them. Just to be on the safe side, I did
    implement one and called WritableComparator.define() with it in
    Sample's initializer. What am I missing here?

    Thanks again for the help.

    -steve

    ---

    package net.intelresearch.cvmHadoop;

    import java.io.*;

    import org.apache.hadoop.io.*;

    public class Sample implements Writable {

    Address address;
    SampleValue value; // sampled value at this point

    public Sample(Address a, SampleValue v) {
    address = a;
    value = v;
    }

    public SampleValue getValue() { return value;}
    public Address getAddress() { return address; }

    public String toString () {
    return (address.toString() + " " + value.toString());
    }

    public void write(DataOutput out) throws IOException {
    address.write(out);
    value.write(out);
    }

    public void readFields(DataInput in) throws IOException {
    address = new Address();
    address.readFields(in);

    value = new SampleValue();
    value.readFields(in);
    }

    public static class Comparator extends WritableComparator {
    public Comparator() {
    super (Sample.class);
    }

    // Just order by Address for now
    public int compare(Sample a, Sample b) {
    return a.getAddress().compareTo(b.getAddress());
    }
    }

    // register this comparator
    static {
    WritableComparator.define(Sample.class, new Comparator());
    }
    }

    On 10/10/07, Matt Kent wrote:
    You're right, Serializable should be sufficient. I was thinking of a
    case where you'd sometimes want to write them out as values, but other
    times combine them inside Sample.
    On 10/10/07, Steve Schlosser wrote:
    Is this true? The fact that SampleValue and Address implement
    Serializable should be sufficient to write them out to the stream.
    They are not ever written out as keys or values themselves.

    -steve
    On 10/10/07, Matt Kent wrote:
    I believe in this case you'll want to make Sample and Address writable as well.
    On 10/10/07, Steve Schlosser wrote:
    Hello all

    Is there a best practice for using my own classes as keys and values?

    My first attempt at doing this was successful - I built a
    BigIntegerWritable class using IntWritable as a template. It was easy
    because BigInteger has methods converting to and from byte arrays,
    which I could then write into the DataOutput or read from the
    DataInput.

    It seems like I should be able to use object serialization to write
    to/read from the DataOutput/Input objects and make my own classes
    implement the Writable interface. It seems like I should be able to
    do something like this:

    import java.io.*;

    import org.apache.hadoop.io.*;

    public class Sample implements Writable {

    Address address;
    SampleValue value; // sampled value at this point

    public Sample(Address a, SampleValue v) {
    address = a;
    value = v;
    }

    public SampleValue getValue() { return value;}
    public Address getAddress() { return address; }

    public String toString () {
    return (address.toString() + " " + value.toString());
    }

    [...]

    public void readFields(DataInput in) throws IOException {
    ObjectInputStream oin = new ObjectInputStream((DataInputBuffer)in);

    try {
    address = (Address)oin.readObject();
    value = (SampleValue)oin.readObject();
    } catch (ClassNotFoundException e) {
    throw new IOException(e.toString());
    }

    }

    public void write(DataOutput out) throws IOException {
    ObjectOutputStream oout = new ObjectOutputStream((DataOutputBuffer)out);

    oout.writeObject(address);
    oout.writeObject(value);
    }
    }

    This code compiles, but throws exceptions at runtime, complaining that
    WritableComparator can not access a member of class Sample with
    modifiers "". Can someone tell me what this exception is talking
    about?

    Do I need to implement a WritableComparator for each class that I want
    to implement Writable?

    Thanks again for the help.

    -steve
  • Ted Dunning at Oct 10, 2007 at 10:30 pm
    Steve,

    You don't need to implement the comparator. I do think that you need a
    no-argument constructor. In general, when Hadoop is creating one of your
    objects, it will call the no-argument constructor and then call readFields.

    As a point of style, I would consider it very bad form to not mark fields as
    private without a very good reason.
    On 10/10/07 3:18 PM, "Steve Schlosser" wrote:

    For the time being, I've given up on using object serialization to do
    what I want. Instead, I'm going to just marshal and unmarshal the
    values of my class myself. I've implemented write() and readField()
    methods in the classes that I want to read and write. (See my
    definition of Sample below.)

    Unfortunately, Hadoop throws the following exception when my program starts:

    Job started: Wed Oct 10 18:04:06 EDT 2007
    07/10/10 18:04:06 INFO mapred.InputFormatBase: Total input paths to process :
    1
    07/10/10 18:04:06 INFO mapred.JobClient: Running job: job_nlx1k6
    07/10/10 18:04:06 WARN mapred.LocalJobRunner: job_nlx1k6
    java.lang.ExceptionInInitializerError
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:315)
    at
  • Sagar Naik at Oct 10, 2007 at 4:48 pm
    Hey Steve,
    WritableComparable should be implemented by classes meant as Keys
    Writable should be implemented by classes meant as Values
    The class as Keys need comparing ability because they undergo sorting
    The keys undergo sorting , hence they need a comparison


    Steve Schlosser wrote:
    Hello all

    Is there a best practice for using my own classes as keys and values?

    My first attempt at doing this was successful - I built a
    BigIntegerWritable class using IntWritable as a template. It was easy
    because BigInteger has methods converting to and from byte arrays,
    which I could then write into the DataOutput or read from the
    DataInput.

    It seems like I should be able to use object serialization to write
    to/read from the DataOutput/Input objects and make my own classes
    implement the Writable interface. It seems like I should be able to
    do something like this:

    import java.io.*;

    import org.apache.hadoop.io.*;

    public class Sample implements Writable {

    Address address;
    SampleValue value; // sampled value at this point

    public Sample(Address a, SampleValue v) {
    address = a;
    value = v;
    }

    public SampleValue getValue() { return value;}
    public Address getAddress() { return address; }

    public String toString () {
    return (address.toString() + " " + value.toString());
    }

    [...]

    public void readFields(DataInput in) throws IOException {
    ObjectInputStream oin = new ObjectInputStream((DataInputBuffer)in);

    try {
    address = (Address)oin.readObject();
    value = (SampleValue)oin.readObject();
    } catch (ClassNotFoundException e) {
    throw new IOException(e.toString());
    }

    }

    public void write(DataOutput out) throws IOException {
    ObjectOutputStream oout = new ObjectOutputStream((DataOutputBuffer)out);

    oout.writeObject(address);
    oout.writeObject(value);
    }
    }

    This code compiles, but throws exceptions at runtime, complaining that
    WritableComparator can not access a member of class Sample with
    modifiers "". Can someone tell me what this exception is talking
    about?

    Do I need to implement a WritableComparator for each class that I want
    to implement Writable?

    Thanks again for the help.

    -steve

    --
    This message has been scanned for viruses and
    dangerous content and is believed to be clean.
  • Mike Navarro at Oct 10, 2007 at 5:38 pm
    No longer working on Hadoop, need to unsubscribe.

    After sending numerous unsubscribe emails to both
    'hadoop-dev-unsubscribe@lucene.apache.org' and
    'hadoop-user-unsubscribe@lucene.apache.org', I regret that I have to
    bother you all to ask if anybody can help me stop receiving these
    emails. I was unable to find any other contact email addresses through
    the http://lucene.apache.org.

    Help, sorry, thanks, please.

    -Mike
  • Steve Schlosser at Oct 11, 2007 at 1:29 am
    Ah - this was the problem! Now that I have the constructor, I am able
    to serialize either way - using Java's serialization or my own. For
    this app, I'm happy either way, but I'll think about sticking with my
    own serialization in the future.

    Thanks for the help!

    -steve
    On 10/10/07, Christopher Douglas wrote:
    It looks like you're missing a *public*, default constructor; the
    framework can't create an instance to call readFields() on.

    That said: you might want to reconsider using Java serialization within
    Writables. I know it seems restrictive, but you're opening yourself up
    to a rancid cornucopia of ClassLoader, static, etc. issues. -C
    -----Original Message-----
    From: Steve Schlosser
    Sent: Wednesday, October 10, 2007 7:57 AM
    To: hadoop-user@lucene.apache.org
    Subject: Using my own classes as keys and values

    Hello all

    Is there a best practice for using my own classes as keys and values?

    My first attempt at doing this was successful - I built a
    BigIntegerWritable class using IntWritable as a template. It was easy
    because BigInteger has methods converting to and from byte arrays,
    which I could then write into the DataOutput or read from the
    DataInput.

    It seems like I should be able to use object serialization to write
    to/read from the DataOutput/Input objects and make my own classes
    implement the Writable interface. It seems like I should be able to
    do something like this:

    import java.io.*;

    import org.apache.hadoop.io.*;

    public class Sample implements Writable {

    Address address;
    SampleValue value; // sampled value at this point

    public Sample(Address a, SampleValue v) {
    address = a;
    value = v;
    }

    public SampleValue getValue() { return value;}
    public Address getAddress() { return address; }

    public String toString () {
    return (address.toString() + " " + value.toString());
    }

    [...]

    public void readFields(DataInput in) throws IOException {
    ObjectInputStream oin = new
    ObjectInputStream((DataInputBuffer)in);

    try {
    address = (Address)oin.readObject();
    value = (SampleValue)oin.readObject();
    } catch (ClassNotFoundException e) {
    throw new IOException(e.toString());
    }

    }

    public void write(DataOutput out) throws IOException {
    ObjectOutputStream oout = new
    ObjectOutputStream((DataOutputBuffer)out);

    oout.writeObject(address);
    oout.writeObject(value);
    }
    }

    This code compiles, but throws exceptions at runtime, complaining that
    WritableComparator can not access a member of class Sample with
    modifiers "". Can someone tell me what this exception is talking
    about?

    Do I need to implement a WritableComparator for each class that I want
    to implement Writable?

    Thanks again for the help.

    -steve

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 10, '07 at 3:03p
activeOct 11, '07 at 1:29a
posts9
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase