FAQ
Hi all ,

I am working on a sort function and it is working perfectly fine with a
single map task.

When I give 2 map tasks, the entire data is replicated twice (sorted
output) . When giving 4 map tasks , it gives 4 times the sorted data. and so
on ....

I modified the Terasort for this.
Major modifications : HashPartitioner instead of the TotalOrderPartitioner
No Sampler
IdentityMapper
IdentityReducer

I have been trying to run the function in a single node.
I tried printing the length of the fileSplits when they are generated. All
that makes sense.. But final output is getting n times. How to debug this ?
Some one please tell me whats wrong with my FileSplit / Map tasks whatever
...


Regards,

Matthew

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 24, '10 at 11:35a
activeSep 24, '10 at 11:35a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Matthew John: 1 post

People

Translate

site design / logo © 2022 Grokbase