FAQ
hi, there,
I am trying to make the word-count example output total
ordered, after specifying the input sampler and totalorderpartitioner in the
main function, I always get the IOException:
*
"main" java.io.IOException: wrong key class:
org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.Text
at
org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:1112)
at
org.apache.hadoop.mapred.lib.InputSampler.writePartitionFile(InputSampler.java:338)
*
After a check of the source code, I found that in method
InputSampler.writePartitionFile, the sampler reads data from InputFormat(in
my code it's o.a.h.mapred.TextInputFormat), and when it writes to partition
file, it uses the mapoutput keyclass as the output key type, this explains
why there is key type mismatch(<K, V> for TextInputFormat is <LongWritable,
Text>, Map's output is <Text, IntWritable>).

* final InputFormat<K,V> inf = (InputFormat<K,V>) job.getInputFormat();*
* int numPartitions = job.getNumReduceTasks();*
* K[] samples = sampler.getSample(inf, job);*
......
SequenceFile.Writer writer = SequenceFile.createWriter(fs, job, dst,
job.getMapOutputKeyClass(), NullWritable.class);
NullWritable nullValue = NullWritable.get();
......
writer.append(samples[k], nullValue);

To me I think it's more reasonable that the sampler samples
the mapper's output, not mapper's input. But either way, I think
the writePartitionFile method should make sure the sampled key class type in
accordance with the key types it outputs to partition file.

Has any body successfully made a total order sort?

--
best wishes.
steven

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 5, '10 at 7:39a
activeJan 5, '10 at 7:39a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Steven zhuang: 1 post

People

Translate

site design / logo © 2022 Grokbase