|
Owen O'Malley |
at Sep 22, 2008 at 5:50 am
|
⇧ |
| |
On Sep 21, 2008, at 9:08 PM, Sandy wrote:
Just a quick clarification:
The combiner function acts as an optimization between the map and
the reduce
phases. Is the output of the combiner phase stored in memory before
being
handed to reduce? Or is it written to disk and subsequently read
from disk
by the reduce phase?
The data path doesn't change with a combiner, except that the keys and
values are reinstantiated to be given to the combiner. When the map is
done, the data is written to disk completely sorted, however, since
the sort buffer may fill up before the end of the map, the partial
results may be sorted, optionally combined, and written to disk. We
call these "spills". If there is more than one spill, the results will
be read back by a merge sort and written back in a single file.
-- Owen