I am writing a Secondary Sort to sort a String key and float value.  I am
following the example in
mapred/src/examples/org/apache/hadoop/examples/SecondarySort.java in the hadoop
package.  The example is for a pair of integers.  I did lots of research online
but most of them were still using the old API.  It seems that for the new API, I
have to implement the RawComparator interface which means I need to write the
compare byte function no matter what.

I have problem with this code:
public static class FirstGroupingComparator
implements RawComparator<IntPair> {
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return WritableComparator.compareBytes(b1, s1, Integer.SIZE/8,
b2, s2, Integer.SIZE/8);
public int compare(IntPair o1, IntPair o2) {
int l = o1.getFirst();
int r = o2.getFirst();
return l == r ? 0 : (l < r ? -1 : 1);

How do I write the code inside the first compare function?  What should I put as
the length of the String and float (primitive type) in the compareBytes
function?  Does anyone have any examples for a pair of String and float?

Thanks.  Merry Christmas.

Search Discussions

  • Harsh J at Dec 26, 2010 at 8:38 am

    You can use WritableComparator for "Writable" serializations. Docs
    here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/WritableComparator.html

    The issue lies with how you're encoding your pair of <String, Float>.
    If you know sizes defined for each (or have a marker byte between,
    etc.), you can extract the bytes out of the required object alone
    (String or Float) and use the compareBytes function on it. The "s1 &
    s2" define start points, and "l1 and l2" define lengths to read from
    "s1 & s2" points -- on the passed byte[] arrays for the two "Writable"

    You can also, perhaps, de-serialize the whole byte stream (via your
    Writable.readFields()) and then compare object-wise -- but this would
    make it slow, since byte-to-byte comparisions are faster, hence

    Avro has a neat serialization, I prefer using it over plain Writables.
    Working with a "Schema" is much more easier.

    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
postedDec 26, '10 at 5:06a
activeDec 26, '10 at 8:38a

2 users in discussion

Harsh J: 1 post Savannah Beckett: 1 post



site design / logo © 2022 Grokbase