FAQ
Hi. I had difficulties in getting Reduce sorting to wor - it took me a good art
of a day to figure out what was going wrong, so I'm sharing this in hopes of
earning something from the community or getting hadoop improved to avoid thisind
of error for future users.

I have 2 key classes, one holds a String, the other one extends that, and adds a
boolean.

I implemented the first key class (let's call it Super)

public class Super implements WritableComparable<Super> {
. . .
public int compareTo(Super o) {
// sort on string value
. . .
}

I implemented the 2nd key class (let's call it Sub)

public class Sub extends Super {
. . .
public int compareTo(Sub o) {
// sort on boolean value
. . .
// if equal, use the super:
... else
return super.compareTo(o);
}


With this setup, I used the "Sub" class as a mapper output key, and
expected the sort on the boolean value to happen first, then for equal
values there, the sort on the string values.

What actually happened, was that the sort on the boolean value was
skipped completely, and only the sort on the string was done.

The reason for this is that (in 0.19.1 release) the WritableCompator
instance that is created (using the defaults - no custom Comparator)
knows the class is "Sub", and calls from the key value it created, and
calls the compareTo method, passing it the other key. Both of these
keys are of type Sub. However, they are passed via this code in
WritableComparator:

public int compare(WritableComparable a, WritableComparable b) {
return a.compareTo(b);
}

Java uses the interface spec for WritableComparable that was declared,
in this case WritableComparable<Super>, and infers that the arg type for
the compareTo is Super. So it "skips" calling the compareTo in Sub, and
just calls the one in Super.

The workaround is to change the signature of Sub's compareTo method to
match the spec in the interface, namely it has to take the Super as an
argument, and then cast it to Sub.

This seems like a very error prone design. Am I doing something wrong,
or can this be improved so that this kind of error is avoided?

-Marshall Schor

Search Discussions

  • Owen O'Malley at May 1, 2009 at 3:51 am
    If you use custom key types, you really should be defining a
    RawComparator. It will perform much much better.

    -- Owen
  • Marshall Schor at May 1, 2009 at 11:24 am
    thanks for the tip. I'll look into it - it doesn't look too hard in my
    case to do. -Marshall

    Owen O'Malley wrote:
    If you use custom key types, you really should be defining a
    RawComparator. It will perform much much better.

    -- Owen
  • Sharad Agarwal at May 4, 2009 at 6:40 am

    Marshall Schor wrote:
    public class Super implements WritableComparable<Super> {
    . . .
    public int compareTo(Super o) {
    // sort on string value
    . . .
    }

    I implemented the 2nd key class (let's call it Sub)

    public class Sub extends Super {
    . . .
    public int compareTo(Sub o) {
    // sort on boolean value
    . . .
    // if equal, use the super:
    ... else
    return super.compareTo(o);
    }
    The overridden method must have same arguments as the parent class
    method. Otherwise it is just another method, not an overridden one.
    In your case, if the current code looks like error prone, you can
    make Super also as a template. Then you can use the Sub class in
    the compareTo method However you will have to cast in the
    Super class.

    class Super<T> implements WritableComparable<T> {
    public int compareTo(T o) {
    Super other = (Super) o;
    ....
    }
    }

    class Sub extends Super<Sub> {
    public int compareTo(Sub o) {
    ...
    }
    }

    -Sharad
  • Shevek at May 4, 2009 at 12:55 pm

    On Sun, 2009-05-03 at 23:38 -0700, Sharad Agarwal wrote:
    Marshall Schor wrote:
    public class Super implements WritableComparable<Super> {
    . . .
    public int compareTo(Super o) {
    // sort on string value
    . . .
    }

    I implemented the 2nd key class (let's call it Sub)

    public class Sub extends Super {
    . . .
    public int compareTo(Sub o) {
    // sort on boolean value
    . . .
    // if equal, use the super:
    ... else
    return super.compareTo(o);
    }
    The overridden method must have same arguments as the parent class
    method. Otherwise it is just another method, not an overridden one.
    In your case, if the current code looks like error prone, you can
    make Super also as a template. Then you can use the Sub class in
    the compareTo method However you will have to cast in the
    Super class.
    In this particular case, I _think_ making Sub implement Comparable<Sub>
    will be sufficient since then javac will also generate public volatile
    int compareTo(Object o) { compareTo((Sub)o); } which overrides the
    volatile method in the superclass. Overriding compareTo(Super) is not
    required. See my post to general@hadoop for more details.

    S.
    class Super<T> implements WritableComparable<T> {
    public int compareTo(T o) {
    Super other = (Super) o;
    ....
    }
    }

    class Sub extends Super<Sub> {
    public int compareTo(Sub o) {
    ...
    }
    }

    -Sharad

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 1, '09 at 2:26a
activeMay 4, '09 at 12:55p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase