FAQ
Just a quick clarification:

The combiner function acts as an optimization between the map and the reduce
phases. Is the output of the combiner phase stored in memory before being
handed to reduce? Or is it written to disk and subsequently read from disk
by the reduce phase?


Thanks in advance,

-SM

Search Discussions

  • Qin Gao at Sep 22, 2008 at 4:16 am
    I think it will be written to disk. There will be no difference in output
    format or flow with or without combiner.
    On Mon, Sep 22, 2008 at 12:08 AM, Sandy wrote:

    Just a quick clarification:

    The combiner function acts as an optimization between the map and the
    reduce
    phases. Is the output of the combiner phase stored in memory before being
    handed to reduce? Or is it written to disk and subsequently read from disk
    by the reduce phase?


    Thanks in advance,

    -SM
  • Sandy at Sep 22, 2008 at 4:28 am
    Thank you for your swift response!

    -SM
    On Sun, Sep 21, 2008 at 11:15 PM, Qin Gao wrote:

    I think it will be written to disk. There will be no difference in output
    format or flow with or without combiner.
    On Mon, Sep 22, 2008 at 12:08 AM, Sandy wrote:

    Just a quick clarification:

    The combiner function acts as an optimization between the map and the
    reduce
    phases. Is the output of the combiner phase stored in memory before being
    handed to reduce? Or is it written to disk and subsequently read from disk
    by the reduce phase?


    Thanks in advance,

    -SM
  • Owen O'Malley at Sep 22, 2008 at 5:50 am

    On Sep 21, 2008, at 9:08 PM, Sandy wrote:

    Just a quick clarification:

    The combiner function acts as an optimization between the map and
    the reduce
    phases. Is the output of the combiner phase stored in memory before
    being
    handed to reduce? Or is it written to disk and subsequently read
    from disk
    by the reduce phase?
    The data path doesn't change with a combiner, except that the keys and
    values are reinstantiated to be given to the combiner. When the map is
    done, the data is written to disk completely sorted, however, since
    the sort buffer may fill up before the end of the map, the partial
    results may be sorted, optionally combined, and written to disk. We
    call these "spills". If there is more than one spill, the results will
    be read back by a merge sort and written back in a single file.

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 22, '08 at 4:08a
activeSep 22, '08 at 5:50a
posts4
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

Sandy: 2 posts Owen O'Malley: 1 post Qin Gao: 1 post

People

Translate

site design / logo © 2022 Grokbase