The input data was about 5GB, the total map processing time was about
10 minutes. Then, there was 5 minutes of reduce time on top of that
spent moving the files around.
On Sep 21, 2007, at 12:20 PM, Doug Cutting wrote:

Ross Boucher wrote:
My cluster has 4 machines on it, so based on the recommendations
on the wiki, I set my reduce count to 8. Unfortunately, the
performance was less than ideal. Specifically, when the map
functions had finished, I had to wait an additional 40% of the
total job time just for copying/sorting the files. I know for a
fact that the sort is very fast, so the only remaining question is
why moving the files around takes so long.
How much data was there to copy? How long was the total job time?
If there are only small amounts of data, and the total job time is
short, then copy scheduling overhead might be significant.


Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
groupcommon-user @
postedSep 21, '07 at 7:02p
activeSep 23, '07 at 1:18a



site design / logo © 2022 Grokbase