FAQ
Hi Everyone:

I launched two experiments for sorting 1 Gb and 10 Gb data with hadoop, on
(1) a single machine (2) 5-node clustrer in LAN

The cmd is:

bin/hadoop jar hadoop-*-examples.jar sort [-m <#maps>] [-r <#reduces>]
<in-dir> <out-dir>

the result is shown here:

[image: image.png]

Mapping shows good scalability. The thing is, reduce takes much longer time
than expected in cluster.
As far as I know, hadoop sort uses identity function for reduce, which
simply output the mapping
result in a file. I tested LAN bandwidth, which is ~ 100Mbps, and the
average LAN flow during reduce
is about 10 Mbps (for sending and receiving).
as a result, it appears a bit weird to me here...

I am quite new in hadoop thus forgive me for any stupid questions here...

Thanks.

Best Regards
Yours Sincerely

Jingwei Lu

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 29, '11 at 8:40p
activeJun 29, '11 at 8:40p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Jingwei Lu: 1 post

People

Translate

site design / logo © 2023 Grokbase