FAQ
Compression for intermediate map output is broken
-------------------------------------------------

Key: HADOOP-2943
URL: https://issues.apache.org/jira/browse/HADOOP-2943
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Reporter: Chris Douglas


It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:

{noformat}
java.io.IOException: File is corrupt!
at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
{noformat}

mapred.map.output.compression.type=RECORD works for Writables, but should be updated.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Chris Douglas (JIRA) at Mar 6, 2008 at 2:36 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Attachment: 2943.patch

    This patch updates the Block/RecordCompressWriters to use the (now protected) Serializer objects in SequenceFile::Writer.
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Attachments: 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 6, 2008 at 2:36 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Assignee: Chris Douglas
    Status: Patch Available (was: Open)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 6, 2008 at 2:48 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Attachment: 2943.patch

    Suppressed unchecked warnings in append
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 6, 2008 at 2:48 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Patch Available (was: Open)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 6, 2008 at 2:48 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Open (was: Patch Available)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 6, 2008 at 5:22 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575566#action_12575566 ]

    Hadoop QA commented on HADOOP-2943:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12377219/2943.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included -1. The patch doesn't appear to include any new or modified tests.
    Please justify why no tests are needed for this patch.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1901/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1901/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1901/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1901/console

    This message is automatically generated.
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 6, 2008 at 10:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Attachment: 2943.patch

    Added a test case
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 6, 2008 at 10:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Open (was: Patch Available)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 6, 2008 at 10:06 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Patch Available (was: Open)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 7, 2008 at 12:24 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575973#action_12575973 ]

    Hadoop QA commented on HADOOP-2943:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12377288/2943.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 3 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1908/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1908/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1908/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1908/console

    This message is automatically generated.
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Arun C Murthy (JIRA) at Mar 7, 2008 at 6:52 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Arun C Murthy updated HADOOP-2943:
    ----------------------------------

    Status: Open (was: Patch Available)

    Chris, please remove:

    {noformat}
    + public synchronized void append(Writable key, Writable val)
    + throws IOException {
    + append((Object)key, (Object)val);
    + }
    +
    {noformat}

    from both RecordCompressWriter & BlockCompressorWriter since the one in Writer is enough. Thanks!
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Attachments: 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Arun C Murthy (JIRA) at Mar 7, 2008 at 6:52 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Arun C Murthy updated HADOOP-2943:
    ----------------------------------

    Fix Version/s: 0.17.0
    Affects Version/s: 0.17.0
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 7, 2008 at 6:59 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Attachment: 2943.patch
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 7, 2008 at 7:01 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Patch Available (was: Open)

    Updated patch
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 7, 2008 at 9:29 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Open (was: Patch Available)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 7, 2008 at 9:29 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Patch Available (was: Open)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 7, 2008 at 10:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Patch Available (was: Open)

    Trying to get Hudson to notice this...
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 7, 2008 at 10:45 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Open (was: Patch Available)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 8, 2008 at 1:37 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Open (was: Patch Available)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 8, 2008 at 1:37 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Status: Patch Available (was: Open)
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 8, 2008 at 9:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576616#action_12576616 ]

    Hadoop QA commented on HADOOP-2943:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12377380/2943.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 3 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1917/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1917/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1917/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1917/console

    This message is automatically generated.
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Mar 9, 2008 at 4:09 am
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas updated HADOOP-2943:
    ----------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just committed this.
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Mar 9, 2008 at 12:17 pm
    [ https://issues.apache.org/jira/browse/HADOOP-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576756#action_12576756 ]

    Hudson commented on HADOOP-2943:
    --------------------------------

    Integrated in Hadoop-trunk #424 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/424/])
    Compression for intermediate map output is broken
    -------------------------------------------------

    Key: HADOOP-2943
    URL: https://issues.apache.org/jira/browse/HADOOP-2943
    Project: Hadoop Core
    Issue Type: Bug
    Components: mapred
    Affects Versions: 0.17.0
    Reporter: Chris Douglas
    Assignee: Chris Douglas
    Fix For: 0.17.0

    Attachments: 2943.patch, 2943.patch, 2943.patch, 2943.patch


    It looks like SequenceFile::RecordCompressWriter and SequenceFile::BlockCompressWriter weren't updated to use the new serialization added in HADOOP-1986. This causes failures in the merge when mapred.compress.map.output is true and mapred.map.output.compression.type=BLOCK:
    {noformat}
    java.io.IOException: File is corrupt!
    at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1656)
    at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1969)
    at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
    at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2785)
    at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2494)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:654)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:740)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:212)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2077)
    {noformat}
    mapred.map.output.compression.type=RECORD works for Writables, but should be updated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 6, '08 at 2:34a
activeMar 9, '08 at 12:17p
posts24
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Hudson (JIRA): 24 posts

People

Translate

site design / logo © 2022 Grokbase