FAQ
I want to do a total sort on some data whose key type is Writable but not
Text. I wrote an InputSampler.RandomSampler object following the example in
the "Total Sort" section of *Hadoop: The Definitive Guide*. When I
call InputSampler.writePartitionFile() I get a class cast exception because
my key type cannot be cast to Text. Specifically the issue seems to be the
following section of InputSampler.getSample():

K key = reader.getCurrentKey();
....
Text keyCopy = WritableUtils.<Text>clone((Text)key,
job.getConfiguration());
From this source it does appear that you can only use a RandomSampler on
data with Text keys. However, I'm confused because I don't see this
mentioned in any documentation, and I assume this wouldn't be the case
because InputSampler takes <Key, Value> generic specifications.

1. Does InputSampler.RandomSampler only work on data with Text key
values?
2. If so, what is the easiest way to generate a random sample for data
with non-Text key values? Is there example code anywhere?

Search Discussions

  • Joey Echeverria at May 19, 2011 at 1:09 am
    That sounds like a bug to me.

    I think the easiest way would be to modify InputSampler to handle non Text keys.

    -Joey
    On Wed, May 18, 2011 at 4:24 PM, W.P. McNeill wrote:
    I want to do a total sort on some data whose key type is Writable but not
    Text.  I wrote an InputSampler.RandomSampler object following the example in
    the "Total Sort" section of *Hadoop: The Definitive Guide*.  When I
    call InputSampler.writePartitionFile() I get a class cast exception because
    my key type cannot be cast to Text.  Specifically the issue seems to be the
    following section of InputSampler.getSample():

    K key = reader.getCurrentKey();
    ....
    Text keyCopy = WritableUtils.<Text>clone((Text)key,
    job.getConfiguration());

    From this source it does appear that you can only use a RandomSampler on
    data with Text keys.  However, I'm confused because I don't see this
    mentioned in any documentation, and I assume this wouldn't be the case
    because InputSampler takes <Key, Value> generic specifications.

    1. Does InputSampler.RandomSampler only work on data with Text key
    values?
    2. If so, what is the easiest way to generate a random sample for data
    with non-Text key values?  Is there example code anywhere?


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • W.P. McNeill at May 19, 2011 at 4:29 pm
    Should I file a bug then? Do I do that
    here<https://issues.apache.org/jira/browse/HADOOP>
    ?
  • Joey Echeverria at May 19, 2011 at 4:33 pm
    Filing a bug is a great idea. InputSampler is in the MapReduce hadoop
    sub-project which has it's own Jira project:

    https://issues.apache.org/jira/browse/MAPREDUCE

    -Joey
    On Thu, May 19, 2011 at 9:28 AM, W.P. McNeill wrote:
    Should I file a bug then?  Do I do that
    here<https://issues.apache.org/jira/browse/HADOOP>
    ?


    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 18, '11 at 11:24p
activeMay 19, '11 at 4:33p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

W.P. McNeill: 2 posts Joey Echeverria: 2 posts

People

Translate

site design / logo © 2022 Grokbase