FAQ
Wrong comparator used to merge files in Reduce phase
----------------------------------------------------

Key: HADOOP-1535
URL: https://issues.apache.org/jira/browse/HADOOP-1535
Project: Hadoop
Issue Type: Bug
Reporter: Vivek Ratan


As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Devaraj Das (JIRA) at Jun 27, 2007 at 5:43 am
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das updated HADOOP-1535:
    --------------------------------

    Fix Version/s: 0.14.0
    Affects Version/s: 0.12.3
    0.13.0

    bq. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase.

    A small clarification - we use the *map output key comparator* for sorting map outputs and the same comparator must be used for merging them (on the reducer side). Also, we should continue to use the map output key comparator for iterating through the key/value records from the (possibly merged) map outputs; we should use the value-grouping comparator to only decide whether the current key we are looking at is "equal" to the last key that was looked at, while the user's reducer method is iterating through the values for a key.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Fix For: 0.14.0


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jun 27, 2007 at 5:47 am
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Devaraj Das reassigned HADOOP-1535:
    -----------------------------------

    Assignee: Vivek Ratan
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jun 27, 2007 at 12:13 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508522 ]

    Vivek Ratan commented on HADOOP-1535:
    -------------------------------------

    A related issue: During the Map phase, when we sort key-value pairs, we use the comparator returned by JobConf.getOutputKeyComparator() (in BasicTypeSorterBase::configure()). When we merge files (in MapOutputBuffer::mergeParts()), we use the comparator returned by 'new WritableComparator(keyClass)' (in SequenceFile::Sorter::Sorter()). This is not right, as the exact same comparator should be used both during sort and during merge (as well as during merge in the Reduce phase). There can be situations when JobConf.getOutputKeyComparator() and WritableComparator(keyClass) return different comparators.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sameer Paranjpye (JIRA) at Jun 27, 2007 at 5:15 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sameer Paranjpye updated HADOOP-1535:
    -------------------------------------

    Component/s: mapred
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jun 28, 2007 at 9:40 am
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan updated HADOOP-1535:
    --------------------------------

    Attachment: 1535_01.patch

    We use the comparator returned by JobConf.getOutputKeyComparator() for the sort/merge phases of Map and Reduce. We use the comparator returned by JobConf.getOutputValueGroupingComparator() for the iterator across values for a given key. See 1535_01.patch.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Nigel Daley at Jun 28, 2007 at 4:16 pm
    Hi Vivek,

    Can you include a unit test for this fix?
    On Jun 28, 2007, at 2:40 AM, Vivek Ratan (JIRA) wrote:


    [ https://issues.apache.org/jira/browse/HADOOP-1535?
    page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan updated HADOOP-1535:
    --------------------------------

    Attachment: 1535_01.patch

    We use the comparator returned by JobConf.getOutputKeyComparator()
    for the sort/merge phases of Map and Reduce. We use the comparator
    returned by JobConf.getOutputValueGroupingComparator() for the
    iterator across values for a given key. See 1535_01.patch.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/
    HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch


    As per the fix for HADOOP-485, we allow users to optionally
    provide a different comparator to group values when calling the
    user's Reduce function. Devaraj and I were looking at the code
    yesterday and we found that in ReduceTask.java, we use the user-
    supplied comparator to merge the output files from the Map tasks
    (we use the user-supplied comparator when creating a new
    SequenceFile.Sorter object). This is incorrect as the comparator
    used to merge Map output files should be the same as that used to
    create those files in the Map phase. The user-supplied comparator
    for grouping values should be used only in the iterator passed to
    the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan at Jul 5, 2007 at 6:28 pm
    Sure, but can you give me a quick overview of writing unit tests for Hadoop?
    I've used jUnit before, so just want specifics related to Hadoop. I'm
    actually in the MC campus, sitting in the visitor cubes on the 8th floor.
    Ping me when you have a little bit of time, orlet me know who the right
    person for this is. Thx.

    -----Original Message-----
    From: Nigel Daley
    Sent: Thursday, June 28, 2007 9:43 PM
    To: hadoop-dev@lucene.apache.org
    Subject: Re: [jira] Updated: (HADOOP-1535) Wrong comparator used to merge
    files in Reduce phase

    Hi Vivek,

    Can you include a unit test for this fix?
    On Jun 28, 2007, at 2:40 AM, Vivek Ratan (JIRA) wrote:


    [ https://issues.apache.org/jira/browse/HADOOP-1535?
    page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan updated HADOOP-1535:
    --------------------------------

    Attachment: 1535_01.patch

    We use the comparator returned by JobConf.getOutputKeyComparator() for
    the sort/merge phases of Map and Reduce. We use the comparator
    returned by JobConf.getOutputValueGroupingComparator() for the
    iterator across values for a given key. See 1535_01.patch.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/
    HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a
    different comparator to group values when calling the user's Reduce
    function. Devaraj and I were looking at the code yesterday and we
    found that in ReduceTask.java, we use the user- supplied comparator
    to merge the output files from the Map tasks (we use the
    user-supplied comparator when creating a new SequenceFile.Sorter
    object). This is incorrect as the comparator used to merge Map output
    files should be the same as that used to create those files in the
    Map phase. The user-supplied comparator for grouping values should be
    used only in the iterator passed to the user's Reduce function (which
    is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jun 28, 2007 at 10:16 am
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508784 ]

    Devaraj Das commented on HADOOP-1535:
    -------------------------------------

    Pls update the patch to stick to the 80 column boundary for the lines there.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan at Jul 5, 2007 at 6:30 pm
    Sorry. I didn't realize it went to the whole list.

    -----Original Message-----
    From: Vivek Ratan
    Sent: Thursday, July 05, 2007 10:11 PM
    To: hadoop-dev@lucene.apache.org
    Subject: RE: [jira] Updated: (HADOOP-1535) Wrong comparator used to merge
    files in Reduce phase

    Sure, but can you give me a quick overview of writing unit tests for Hadoop?
    I've used jUnit before, so just want specifics related to Hadoop. I'm
    actually in the MC campus, sitting in the visitor cubes on the 8th floor.
    Ping me when you have a little bit of time, orlet me know who the right
    person for this is. Thx.

    -----Original Message-----
    From: Nigel Daley
    Sent: Thursday, June 28, 2007 9:43 PM
    To: hadoop-dev@lucene.apache.org
    Subject: Re: [jira] Updated: (HADOOP-1535) Wrong comparator used to merge
    files in Reduce phase

    Hi Vivek,

    Can you include a unit test for this fix?
    On Jun 28, 2007, at 2:40 AM, Vivek Ratan (JIRA) wrote:


    [ https://issues.apache.org/jira/browse/HADOOP-1535?
    page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan updated HADOOP-1535:
    --------------------------------

    Attachment: 1535_01.patch

    We use the comparator returned by JobConf.getOutputKeyComparator() for
    the sort/merge phases of Map and Reduce. We use the comparator
    returned by JobConf.getOutputValueGroupingComparator() for the
    iterator across values for a given key. See 1535_01.patch.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/
    HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a
    different comparator to group values when calling the user's Reduce
    function. Devaraj and I were looking at the code yesterday and we
    found that in ReduceTask.java, we use the user- supplied comparator
    to merge the output files from the Map tasks (we use the
    user-supplied comparator when creating a new SequenceFile.Sorter
    object). This is incorrect as the comparator used to merge Map output
    files should be the same as that used to create those files in the
    Map phase. The user-supplied comparator for grouping values should be
    used only in the iterator passed to the user's Reduce function (which
    is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jul 10, 2007 at 9:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan updated HADOOP-1535:
    --------------------------------

    Attachment: 1535_02.patch

    The attached patch (1535_02.patch) has the code changes, formatted to the 80-char column limit. I also added a set of new unit test cases. There is a new test file TestComparators.java, under src/test/org/apache/hadoop/mapred. This file has 4 tests to check various combinations of default and user-supplied comparators. One of the tests is the same as that in TestUserValueGrouping.java. That files needs to be deleted and TestComparators.java needs to be added to the repository.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch, 1535_02.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jul 11, 2007 at 12:14 am
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan resolved HADOOP-1535.
    ---------------------------------

    Resolution: Won't Fix

    As per Runping's comments, we don't need this functionality right away.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch, 1535_02.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jul 11, 2007 at 12:14 am
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan reopened HADOOP-1535:
    ---------------------------------


    Sorry, resolved the wrong bug. This one's still open.
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch, 1535_02.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Jul 11, 2007 at 9:02 am
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511695 ]

    Devaraj Das commented on HADOOP-1535:
    -------------------------------------

    +1
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch, 1535_02.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Vivek Ratan (JIRA) at Jul 11, 2007 at 5:20 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Vivek Ratan updated HADOOP-1535:
    --------------------------------

    Status: Patch Available (was: Reopened)
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.13.0, 0.12.3
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch, 1535_02.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Jul 12, 2007 at 4:41 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Owen O'Malley updated HADOOP-1535:
    ----------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks Vivek!
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch, 1535_02.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Jul 13, 2007 at 11:44 am
    [ https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512443 ]

    Hudson commented on HADOOP-1535:
    --------------------------------

    Integrated in Hadoop-Nightly #154 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/154/])
    Wrong comparator used to merge files in Reduce phase
    ----------------------------------------------------

    Key: HADOOP-1535
    URL: https://issues.apache.org/jira/browse/HADOOP-1535
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.12.3, 0.13.0
    Reporter: Vivek Ratan
    Assignee: Vivek Ratan
    Fix For: 0.14.0

    Attachments: 1535_01.patch, 1535_02.patch


    As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user-supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJun 27, '07 at 4:58a
activeJul 13, '07 at 11:44a
posts17
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase