FAQ
Hi Folks,

After much poking around I am still unable to determine why I am seeing
'reduce' being called twice with the "same" key.
Recall from my previous email that "sameness" is determined by 'compareTo'
of my custom key type.

AFAIK, the default WritableComparator invokes 'compareTo' for any two keys
which are being ordered during sorting and merging.
Is it somehow possible that a bitwise comparator is used for the spilled map
output rather than the default WritableComparator?

I am out of clues, short of studying the "shuffling" code. If anyone can
suggest some further debugging steps, don't be shy. :)

Thanks!!!

stan

Search Discussions

  • Joey Echeverria at Aug 15, 2011 at 1:33 am
    Does your compareTo() method test object pointer equality? If so, you could
    be getting burned by Hadoop reusing Writable objects.

    -Joey
    On Aug 14, 2011 9:20 PM, "Stan Rosenberg" wrote:
    Hi Folks,

    After much poking around I am still unable to determine why I am seeing
    'reduce' being called twice with the "same" key.
    Recall from my previous email that "sameness" is determined by 'compareTo'
    of my custom key type.

    AFAIK, the default WritableComparator invokes 'compareTo' for any two keys
    which are being ordered during sorting and merging.
    Is it somehow possible that a bitwise comparator is used for the spilled map
    output rather than the default WritableComparator?

    I am out of clues, short of studying the "shuffling" code. If anyone can
    suggest some further debugging steps, don't be shy. :)

    Thanks!!!

    stan
  • Stan Rosenberg at Aug 15, 2011 at 2:08 am

    On Sun, Aug 14, 2011 at 9:33 PM, Joey Echeverria wrote:

    Does your compareTo() method test object pointer equality? If so, you could
    be getting burned by Hadoop reusing Writable objects.

    Yes, but only the equality between enum values. Interestingly, when
    'reduce' is called there are three instances of the "same" key.
    Two instances are correctly merged and they both come from the same mapper.
    The other instance comes from a different mapper, and for
    some reason does not get merged. I see the key and the values
    (corresponding to the two merged instances) passed as arguments
    to 'reduce'; then in subsequent 'reduce' call I see the key and the value
    corresponding to the third instance.

    For completeness, here is my 'Key.compareTo':

    public int compareTo(Key o) {
    if (this.type != o.type) {
    // Type.X < Type.Y
    return (this.type == Type.X ? -1 : 1);
    }
    // otherwise, delegate
    if (this.type == Type.X) {
    return this.key1.compareTo(o.key1);
    } else {
    return this.key2.compareTo(o.key2);
    }
    }

    The 'type' field is an enum with two possible values, say X and Y. Key is
    essentially a union type; i.e., at any given time
    it's the values in key1 or key2 that are being compared (depending on the
    'type' value).
  • Joey Echeverria at Aug 15, 2011 at 2:25 am
    What are the types of key1 and key2? What does the readFields() method
    look like?

    -Joey

    On Sun, Aug 14, 2011 at 10:07 PM, Stan Rosenberg
    wrote:
    On Sun, Aug 14, 2011 at 9:33 PM, Joey Echeverria wrote:

    Does your compareTo() method test object pointer equality? If so, you could
    be getting burned by Hadoop reusing Writable objects.

    Yes, but only the equality between enum values.  Interestingly, when
    'reduce' is called there are three instances of the "same" key.
    Two instances are correctly merged and they both come from the same mapper.
    The other instance comes from a different mapper, and for
    some reason does not get merged.  I see the key and the values
    (corresponding to the two merged instances) passed as arguments
    to 'reduce'; then in subsequent 'reduce' call I see the key and the value
    corresponding to the third instance.

    For completeness, here is my 'Key.compareTo':

    public int compareTo(Key o) {
    if (this.type != o.type) {
    // Type.X < Type.Y
    return (this.type == Type.X ? -1 : 1);
    }
    // otherwise, delegate
    if (this.type == Type.X) {
    return this.key1.compareTo(o.key1);
    } else {
    return this.key2.compareTo(o.key2);
    }
    }

    The 'type' field is an enum with two possible values, say X and Y.  Key is
    essentially a union type; i.e., at any given time
    it's the values in key1 or key2 that are being compared (depending on the
    'type' value).


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Stan Rosenberg at Aug 15, 2011 at 2:39 am

    On Sun, Aug 14, 2011 at 10:25 PM, Joey Echeverria wrote:

    What are the types of key1 and key2? What does the readFields() method
    look like?

    The type of key1 is essentially a wrapper for java.util.UUID.
    Here is its readFields:

    public void readFields(DataInput in) throws IOException {
    id = new UUID(in.readLong(), in.readLong());
    }

    So, it reconstitutes the UUID by deserializing two longs. The 'compareTo'
    method of this key type delegates to java.util.UUID.compareTo.

    The type of key2 wraps a different id, one that fits into a long. In
    addition to an id, it also stores an enum which designates the "source" of
    this id.
    Here is its readFields:

    public void readFields(DataInput in) throws IOException {
    source = Source.values()[in.readByte() & 0xFF];
    id = in.readLong();
    }

    The source is an enum value which is serialized by writing its ordinal.
    (There are only two possible enum values, hence only one byte.)
    The 'compareTo' method of this key type orders by the source values if the
    id values are different, otherwise by the id values.
  • Chris White at Aug 16, 2011 at 12:12 pm
    Are you using a hash partioner? If so make sure the hash value of the
    writable is not calculated using the hashCode value of the enum - use the
    ordinal value instead. The hashcode value of an enum is different for each
    jvm.
  • Stan Rosenberg at Aug 16, 2011 at 2:20 pm

    On Tue, Aug 16, 2011 at 6:14 AM, Chris White wrote:

    Are you using a hash partioner? If so make sure the hash value of the
    writable is not calculated using the hashCode value of the enum - use the
    ordinal value instead. The hashcode value of an enum is different for each
    jvm.
    Thanks for the tip. I am using a hash partitioner (its the default) but my
    hash value is not based
    on an enum value. In any case, the keys in question get hashed to the same
    reducer.

    Best,

    stan
  • Chris White at Aug 16, 2011 at 8:48 pm
    Can you copy the contents of your parent Writable readField and write
    methods (not the ones youve already posted)

    Another thing you could try is if you know you have two identical keys, can
    you write a unit test to examine the result of compareTo for two instances
    to confirm the correct behavior (even going as far as serializing and
    deserializing before the comparison)

    Finally just to confirm, you dont have any group or order comparators
    registered?
  • Stan Rosenberg at Aug 22, 2011 at 3:54 pm
    Thanks for all the help on this issue. It turned out to be a very simple
    problem with my 'compareTo' implementation.
    The ordering was symmetric but _not_ transitive.

    stan
    On Tue, Aug 16, 2011 at 4:47 PM, Chris White wrote:

    Can you copy the contents of your parent Writable readField and write
    methods (not the ones youve already posted)

    Another thing you could try is if you know you have two identical keys, can
    you write a unit test to examine the result of compareTo for two instances
    to confirm the correct behavior (even going as far as serializing and
    deserializing before the comparison)

    Finally just to confirm, you dont have any group or order comparators
    registered?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 15, '11 at 1:20a
activeAug 22, '11 at 3:54p
posts9
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase