FAQ
I am writing a Secondary Sort to sort a String key and float value.  I am
following the example in
mapred/src/examples/org/apache/hadoop/examples/SecondarySort.java in the hadoop
package.  The example is for a pair of integers.  I did lots of research online
but most of them were still using the old API.  It seems that for the new API, I
have to implement the RawComparator interface which means I need to write the
compare byte function no matter what.


I have problem with this code:
public static class FirstGroupingComparator
implements RawComparator<IntPair> {
@Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return WritableComparator.compareBytes(b1, s1, Integer.SIZE/8,
b2, s2, Integer.SIZE/8);
}
@Override
public int compare(IntPair o1, IntPair o2) {
int l = o1.getFirst();
int r = o2.getFirst();
return l == r ? 0 : (l < r ? -1 : 1);
}
}


How do I write the code inside the first compare function?  What should I put as
the length of the String and float (primitive type) in the compareBytes
function?  Does anyone have any examples for a pair of String and float?

Thanks.  Merry Christmas.

Search Discussions

  • Harsh J at Dec 26, 2010 at 8:38 am
    Hi,

    You can use WritableComparator for "Writable" serializations. Docs
    here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/WritableComparator.html

    The issue lies with how you're encoding your pair of <String, Float>.
    If you know sizes defined for each (or have a marker byte between,
    etc.), you can extract the bytes out of the required object alone
    (String or Float) and use the compareBytes function on it. The "s1 &
    s2" define start points, and "l1 and l2" define lengths to read from
    "s1 & s2" points -- on the passed byte[] arrays for the two "Writable"
    objects.

    You can also, perhaps, de-serialize the whole byte stream (via your
    Writable.readFields()) and then compare object-wise -- but this would
    make it slow, since byte-to-byte comparisions are faster, hence
    RawComparator.

    Avro has a neat serialization, I prefer using it over plain Writables.
    Working with a "Schema" is much more easier.

    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 26, '10 at 5:06a
activeDec 26, '10 at 8:38a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Harsh J: 1 post Savannah Beckett: 1 post

People

Translate

site design / logo © 2022 Grokbase