you could run a map-only job, i.e. read your data and output it from the mapper without any reducer at all (set mapred.reduce.tasks=0 or, equivalently, use job.setNumReduceTasks(0)).
That way, you parallelize over your inputs through a number of mappers and do not have any sort/shuffle/reduce overhead.
Gesendet: Donnerstag, 12. Mai 2011 13:16
Betreff: Re: How to merge several SequenceFile into one?
If the order of the keys in sequence file is not important to me, in
other words, the sort process is not necessary, how can I stop the
distributed sort to save the consumption of resource?
Thanks for your suggestion.
2011/5/12 jason <firstname.lastname@example.org>:
M/R job with a single reducer would do the job. This way you can
utilize distributed sort and merge/combine/dedupe key/values as you
On 5/11/11, 丛林 wrote:
There is lots of SequenceFile in HDFS, how can I merge them into one
Thanks for you suggestion.