FAQ
Hi

When I run the wordcount example, I get nearly 100% CPU utilization
for the Map phase, but the Reduce phase takes forever, never breaking
more than 1-2% utilization. Looking at the code, the Reduce isn't
very complicated, so I'm not sure why it's so slow.

Here is my configuration

Hadoop 0.12.3
13 tasktracker and datanodes (6 are slow and 7 are fast)
(I get this behavior with any number of task/data nodes)
1 jobtracker and namenode
2459 input files, total 0f 28MB (a bunch of C code from the Linux kernel)
37 map tasks
1 reduce task

I've seen some other posts to this list about similar problems with
wordcount, but they didn't seem quite right. Any ideas why Map would
be fast and Reduce would be so slow? Thanks!

-steve

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMay 16, '07 at 7:34p
activeMay 16, '07 at 7:34p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Steve Schlosser: 1 post

People

Translate

site design / logo © 2022 Grokbase