FAQ
So I'm pretty new to Hadoop, just learning it for work, and starting to play with some of our data on a VM cluster to see it work, and to make sure it can do what we need to. By and large, very cool, I think I'm getting the hang of it, but when I try and make a custom composite key class, it doesn't seem to correctly group the data correctly.

The data is a bunch of phone numbers with various transactional data (timestamp, phone type, other call data). My Mapper is pretty much just taking the data, and splitting it out into a custom Key (or Text with just the phone number) and custom Value to hold the rest of the data.

In my reducer, I'm counting the number of unique phone numbers among other things using a Reporter counter. Using my key class (code below), I get a total of 56,404 unique numbers which is way too low. When I use just the phone number (using Text) as the key, it gives me 1,159,558 which is correct. In my custom class hashCode() method I'm just using the String.hashCode() for the String holding the phone number.

That seemed reasonable to me, since I wanted it to group the values by the phone number, and then order by the timestamp which is what I'm doing in the compareTo() function.


============================================================================================

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

public class AIMdnTimeKey implements WritableComparable {
String mdn = "";
long timestamp = -1L;
private byte oli = 0;

public AIMdnTimeKey() {
}

public AIMdnTimeKey( String initMdn, long initTimestamp) {
mdn = initMdn;
timestamp = initTimestamp;
}

public void setMdn( String newMdn ) {
mdn = newMdn;
}

public String getMdn() {
return mdn;
}

public void setTimestamp( long newTimestamp ) {
timestamp = newTimestamp;
}

public long getTimestamp() {
return timestamp;
}

public void write(DataOutput out) throws IOException {
out.writeUTF(mdn);
out.writeByte(oli);
out.writeLong(timestamp);
}

public void readFields(DataInput in) throws IOException {
mdn = in.readUTF();
oli = in.readByte();
timestamp = in.readLong();
}

public int compareTo(Object obj) throws ClassCastException {
if (obj == null) {
throw new ClassCastException("Object is NULL and so cannot be compared!");
}
if (getClass() != obj.getClass()) {
throw new ClassCastException("Object is of type " + obj.getClass().getName() + " which cannot be compared to this class of type " + getClass().getName());
}
final AIMdnTimeKey other = (AIMdnTimeKey) obj;

return (int)(this.timestamp - other.timestamp);
}

@Override
public int hashCode() {

return mdn.hashCode();
}

@Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final AIMdnTimeKey other = (AIMdnTimeKey) obj;
if ((this.mdn == null) ? (other.mdn != null) : !this.mdn.equals(other.mdn)) {
return false;
}
return true;
}

@Override
public String toString() {
return mdn + " " + timestamp;
}

/**
* @return the oli
*/
public byte getOli() {
return oli;
}

/**
* @param oli the oli to set
*/
public void setOli(byte oli) {
this.oli = oli;
}
}

============================================================================================



Aaron Baff | Developer | Telescope, Inc.

email: aaron.baff@telescope.tv | office: 424 270 2913 | www.telescope.tv<http://www.telescope.tv/>

The information contained in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any views expressed in this message are those of the individual and may not necessarily reflect the views of Telescope Inc. or its associated companies.

Search Discussions

  • Kaluskar, Sanjay at Sep 11, 2010 at 2:01 am
    Have you considered using something higher-level like PIG or Hive? Are
    there reasons why you need to process at this low level?

    -----Original Message-----
    From: Aaron Baff
    Sent: Friday, September 10, 2010 11:50 PM
    To: common-user@hadoop.apache.org
    Subject: Custom Key class not working correctly

    So I'm pretty new to Hadoop, just learning it for work, and starting to
    play with some of our data on a VM cluster to see it work, and to make
    sure it can do what we need to. By and large, very cool, I think I'm
    getting the hang of it, but when I try and make a custom composite key
    class, it doesn't seem to correctly group the data correctly.

    The data is a bunch of phone numbers with various transactional data
    (timestamp, phone type, other call data). My Mapper is pretty much just
    taking the data, and splitting it out into a custom Key (or Text with
    just the phone number) and custom Value to hold the rest of the data.

    In my reducer, I'm counting the number of unique phone numbers among
    other things using a Reporter counter. Using my key class (code below),
    I get a total of 56,404 unique numbers which is way too low. When I use
    just the phone number (using Text) as the key, it gives me 1,159,558
    which is correct. In my custom class hashCode() method I'm just using
    the String.hashCode() for the String holding the phone number.

    That seemed reasonable to me, since I wanted it to group the values by
    the phone number, and then order by the timestamp which is what I'm
    doing in the compareTo() function.


    ========================================================================
    ====================

    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import org.apache.hadoop.io.WritableComparable;
    import org.apache.hadoop.io.WritableComparator;

    public class AIMdnTimeKey implements WritableComparable {
    String mdn = "";
    long timestamp = -1L;
    private byte oli = 0;

    public AIMdnTimeKey() {
    }

    public AIMdnTimeKey( String initMdn, long initTimestamp) {
    mdn = initMdn;
    timestamp = initTimestamp;
    }

    public void setMdn( String newMdn ) {
    mdn = newMdn;
    }

    public String getMdn() {
    return mdn;
    }

    public void setTimestamp( long newTimestamp ) {
    timestamp = newTimestamp;
    }

    public long getTimestamp() {
    return timestamp;
    }

    public void write(DataOutput out) throws IOException {
    out.writeUTF(mdn);
    out.writeByte(oli);
    out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
    mdn = in.readUTF();
    oli = in.readByte();
    timestamp = in.readLong();
    }

    public int compareTo(Object obj) throws ClassCastException {
    if (obj == null) {
    throw new ClassCastException("Object is NULL and so cannot
    be compared!");
    }
    if (getClass() != obj.getClass()) {
    throw new ClassCastException("Object is of type " +
    obj.getClass().getName() + " which cannot be compared to this class of
    type " + getClass().getName());
    }
    final AIMdnTimeKey other = (AIMdnTimeKey) obj;

    return (int)(this.timestamp - other.timestamp);
    }

    @Override
    public int hashCode() {

    return mdn.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
    if (obj == null) {
    return false;
    }
    if (getClass() != obj.getClass()) {
    return false;
    }
    final AIMdnTimeKey other = (AIMdnTimeKey) obj;
    if ((this.mdn == null) ? (other.mdn != null) :
    !this.mdn.equals(other.mdn)) {
    return false;
    }
    return true;
    }

    @Override
    public String toString() {
    return mdn + " " + timestamp;
    }

    /**
    * @return the oli
    */
    public byte getOli() {
    return oli;
    }

    /**
    * @param oli the oli to set
    */
    public void setOli(byte oli) {
    this.oli = oli;
    }
    }

    ========================================================================
    ====================



    Aaron Baff | Developer | Telescope, Inc.

    email: aaron.baff@telescope.tv |
    office: 424 270 2913 | www.telescope.tv<http://www.telescope.tv/>

    The information contained in this email is confidential and may be
    legally privileged. It is intended solely for the addressee. Access to
    this email by anyone else is unauthorized. If you are not the intended
    recipient, any disclosure, copying, distribution or any action taken or
    omitted to be taken in reliance on it, is prohibited and may be
    unlawful. Any views expressed in this message are those of the
    individual and may not necessarily reflect the views of Telescope Inc.
    or its associated companies.
  • James Seigel at Sep 11, 2010 at 2:13 am
    Is the footer on this email a little rough for content that will be passed around and made indexable on the internets?

    Just saying :)

    Cheers
    James

    Sent from my mobile. Please excuse the typos.
    On 2010-09-10, at 8:01 PM, "Kaluskar, Sanjay" wrote:

    Have you considered using something higher-level like PIG or Hive? Are
    there reasons why you need to process at this low level?

    -----Original Message-----
    From: Aaron Baff
    Sent: Friday, September 10, 2010 11:50 PM
    To: common-user@hadoop.apache.org
    Subject: Custom Key class not working correctly

    So I'm pretty new to Hadoop, just learning it for work, and starting to
    play with some of our data on a VM cluster to see it work, and to make
    sure it can do what we need to. By and large, very cool, I think I'm
    getting the hang of it, but when I try and make a custom composite key
    class, it doesn't seem to correctly group the data correctly.

    The data is a bunch of phone numbers with various transactional data
    (timestamp, phone type, other call data). My Mapper is pretty much just
    taking the data, and splitting it out into a custom Key (or Text with
    just the phone number) and custom Value to hold the rest of the data.

    In my reducer, I'm counting the number of unique phone numbers among
    other things using a Reporter counter. Using my key class (code below),
    I get a total of 56,404 unique numbers which is way too low. When I use
    just the phone number (using Text) as the key, it gives me 1,159,558
    which is correct. In my custom class hashCode() method I'm just using
    the String.hashCode() for the String holding the phone number.

    That seemed reasonable to me, since I wanted it to group the values by
    the phone number, and then order by the timestamp which is what I'm
    doing in the compareTo() function.


    ========================================================================
    ====================

    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import org.apache.hadoop.io.WritableComparable;
    import org.apache.hadoop.io.WritableComparator;

    public class AIMdnTimeKey implements WritableComparable {
    String mdn = "";
    long timestamp = -1L;
    private byte oli = 0;

    public AIMdnTimeKey() {
    }

    public AIMdnTimeKey( String initMdn, long initTimestamp) {
    mdn = initMdn;
    timestamp = initTimestamp;
    }

    public void setMdn( String newMdn ) {
    mdn = newMdn;
    }

    public String getMdn() {
    return mdn;
    }

    public void setTimestamp( long newTimestamp ) {
    timestamp = newTimestamp;
    }

    public long getTimestamp() {
    return timestamp;
    }

    public void write(DataOutput out) throws IOException {
    out.writeUTF(mdn);
    out.writeByte(oli);
    out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
    mdn = in.readUTF();
    oli = in.readByte();
    timestamp = in.readLong();
    }

    public int compareTo(Object obj) throws ClassCastException {
    if (obj == null) {
    throw new ClassCastException("Object is NULL and so cannot
    be compared!");
    }
    if (getClass() != obj.getClass()) {
    throw new ClassCastException("Object is of type " +
    obj.getClass().getName() + " which cannot be compared to this class of
    type " + getClass().getName());
    }
    final AIMdnTimeKey other = (AIMdnTimeKey) obj;

    return (int)(this.timestamp - other.timestamp);
    }

    @Override
    public int hashCode() {

    return mdn.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
    if (obj == null) {
    return false;
    }
    if (getClass() != obj.getClass()) {
    return false;
    }
    final AIMdnTimeKey other = (AIMdnTimeKey) obj;
    if ((this.mdn == null) ? (other.mdn != null) :
    !this.mdn.equals(other.mdn)) {
    return false;
    }
    return true;
    }

    @Override
    public String toString() {
    return mdn + " " + timestamp;
    }

    /**
    * @return the oli
    */
    public byte getOli() {
    return oli;
    }

    /**
    * @param oli the oli to set
    */
    public void setOli(byte oli) {
    this.oli = oli;
    }
    }

    ========================================================================
    ====================



    Aaron Baff | Developer | Telescope, Inc.

    email: aaron.baff@telescope.tv |
    office: 424 270 2913 | www.telescope.tv<http://www.telescope.tv/>

    The information contained in this email is confidential and may be
    legally privileged. It is intended solely for the addressee. Access to
    this email by anyone else is unauthorized. If you are not the intended
    recipient, any disclosure, copying, distribution or any action taken or
    omitted to be taken in reliance on it, is prohibited and may be
    unlawful. Any views expressed in this message are those of the
    individual and may not necessarily reflect the views of Telescope Inc.
    or its associated companies.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 10, '10 at 6:20p
activeSep 11, '10 at 2:13a
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase