Hi
When I run the wordcount example, I get nearly 100% CPU utilization
for the Map phase, but the Reduce phase takes forever, never breaking
more than 1-2% utilization. Looking at the code, the Reduce isn't
very complicated, so I'm not sure why it's so slow.
Here is my configuration
Hadoop 0.12.3
13 tasktracker and datanodes (6 are slow and 7 are fast)
(I get this behavior with any number of task/data nodes)
1 jobtracker and namenode
2459 input files, total 0f 28MB (a bunch of C code from the Linux kernel)
37 map tasks
1 reduce task
I've seen some other posts to this list about similar problems with
wordcount, but they didn't seem quite right. Any ideas why Map would
be fast and Reduce would be so slow? Thanks!
-steve