FAQ
Data nodes should inform the name-node about block crc errors.
--------------------------------------------------------------

Key: HADOOP-3035
URL: https://issues.apache.org/jira/browse/HADOOP-3035
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.16.0
Reporter: Konstantin Shvachko


Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
{code}
[junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
[junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
[junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
[junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
[junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
[junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
[junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
[junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
[junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
[junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
[junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
[junit] at java.lang.Thread.run(Thread.java:595)
{code}
The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • lohit vijayarenu (JIRA) at May 14, 2008 at 6:03 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    lohit vijayarenu updated HADOOP-3035:
    -------------------------------------

    Attachment: HADOOP-3035-1.patch

    This patch fixes the problem
    - With OP_WRITE_BLOCK, we also send a boolean.
    - If this boolean is true, we also send Client DatanodeInfo along with Client name string
    - This DatanodeInfo would be used to report bad blocks to the namenode by the receiving datanode
    - Added a test case, which creates a file with replication of 1, corrupts it and requests a replication of 2. Upon replication, receiving node detects this and reports it as bad block.

    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Attachments: HADOOP-3035-1.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • lohit vijayarenu (JIRA) at May 14, 2008 at 6:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    lohit vijayarenu updated HADOOP-3035:
    -------------------------------------

    Attachment: HADOOP-3035-2.patch

    Updated patch with different variable names
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • lohit vijayarenu (JIRA) at May 14, 2008 at 6:29 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    lohit vijayarenu updated HADOOP-3035:
    -------------------------------------

    Release Note: During block transfers between datanodes, the receiving datanode, now can report corrupt replicas received from src node to the namenode
    Hadoop Flags: [Incompatible change]
    Status: Patch Available (was: Open)
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at May 14, 2008 at 8:07 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596920#action_12596920 ]

    Hadoop QA commented on HADOOP-3035:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12382066/HADOOP-3035-2.patch
    against trunk revision 656270.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 7 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    +1 core tests. The patch passed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2468/console

    This message is automatically generated.
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at May 14, 2008 at 9:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Chris Douglas reassigned HADOOP-3035:
    -------------------------------------

    Assignee: lohit vijayarenu
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at May 15, 2008 at 9:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597279#action_12597279 ]

    Raghu Angadi commented on HADOOP-3035:
    --------------------------------------

    +1. A few minor comments :

    - Unit test: our normal approach is to wait in a loop and in each iteration, wait for shorter time (500 millisec) in each iteration. So normally test finishes faster and will be able to handle platform related unexpected (and unavoidable) delays.
    - The test does not belong to TestDatadndeBlockScanner.
    - you could log before invoking reportBadBlocks().

    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • lohit vijayarenu (JIRA) at May 21, 2008 at 7:56 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    lohit vijayarenu updated HADOOP-3035:
    -------------------------------------

    Status: Open (was: Patch Available)
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • lohit vijayarenu (JIRA) at May 21, 2008 at 7:58 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    lohit vijayarenu updated HADOOP-3035:
    -------------------------------------

    Status: Patch Available (was: Open)
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • lohit vijayarenu (JIRA) at May 21, 2008 at 7:58 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    lohit vijayarenu updated HADOOP-3035:
    -------------------------------------

    Attachment: HADOOP-3035-3.patch

    Attached patch changes as suggested by Raghu.
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • lohit vijayarenu (JIRA) at May 22, 2008 at 1:44 am
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    lohit vijayarenu updated HADOOP-3035:
    -------------------------------------

    Status: Open (was: Patch Available)
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • lohit vijayarenu (JIRA) at May 22, 2008 at 1:44 am
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    lohit vijayarenu updated HADOOP-3035:
    -------------------------------------

    Status: Patch Available (was: Open)
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at May 22, 2008 at 8:56 am
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598947#action_12598947 ]

    Hadoop QA commented on HADOOP-3035:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12382506/HADOOP-3035-3.patch
    against trunk revision 659005.

    +1 @author. The patch does not contain any @author tags.

    +1 tests included. The patch appears to include 10 new or modified tests.

    +1 javadoc. The javadoc tool did not generate any warning messages.

    +1 javac. The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs. The patch does not introduce any new Findbugs warnings.

    +1 release audit. The applied patch does not increase the total number of release audit warnings.

    +1 core tests. The patch passed core unit tests.

    +1 contrib tests. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2515/console

    This message is automatically generated.
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at May 22, 2008 at 8:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599169#action_12599169 ]

    Raghu Angadi commented on HADOOP-3035:
    --------------------------------------

    I just committed this. Thanks Lohit!
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Robert Chansler (JIRA) at May 23, 2008 at 12:12 am
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Chansler updated HADOOP-3035:
    ------------------------------------

    Resolution: Fixed
    Fix Version/s: 0.18.0
    Status: Resolved (was: Patch Available)
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Fix For: 0.18.0

    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at May 23, 2008 at 12:28 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599347#action_12599347 ]

    Hudson commented on HADOOP-3035:
    --------------------------------

    Integrated in Hadoop-trunk #500 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/500/])
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: lohit vijayarenu
    Fix For: 0.18.0

    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Robert Chansler (JIRA) at Jun 27, 2008 at 9:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Robert Chansler updated HADOOP-3035:
    ------------------------------------

    Release Note: Changed protocol for transferring blocks between data nodes to report corrupt blocks to data node for re-replication from a good replica. (was: During block transfers between datanodes, the receiving datanode, now can report corrupt replicas received from src node to the namenode)
    Data nodes should inform the name-node about block crc errors.
    --------------------------------------------------------------

    Key: HADOOP-3035
    URL: https://issues.apache.org/jira/browse/HADOOP-3035
    Project: Hadoop Core
    Issue Type: Bug
    Components: dfs
    Affects Versions: 0.16.0
    Reporter: Konstantin Shvachko
    Assignee: Lohit Vijayarenu
    Fix For: 0.18.0

    Attachments: HADOOP-3035-1.patch, HADOOP-3035-2.patch, HADOOP-3035-3.patch


    Currently if a crc error occurs when data-node replicates a block to another node it throws an exception, and continues.
    {code}
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:transferBlocks(811)) - 127.0.0.1:3730 Starting thread to transfer block blk_-1962819020391742554 to 127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:writeBlock(1067)) - Receiving block blk_-1962819020391742554 src: /127.0.0.1:3791 dest: /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,855 INFO dfs.DataNode (DataNode.java:receiveBlock(2504)) - Exception in receiveBlock for block blk_-1962819020391742554 java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:run(2626)) - 127.0.0.1:3730:Transmitted block blk_-1962819020391742554 to /127.0.0.1:3740
    [junit] 2008-03-17 19:46:11,871 INFO dfs.DataNode (DataNode.java:writeBlock(1192)) - writeBlock blk_-1962819020391742554 received exception java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] 2008-03-17 19:46:11,871 ERROR dfs.DataNode (DataNode.java:run(979)) - 127.0.0.1:3740:DataXceiver: java.io.IOException: Unexpected checksum mismatch while writing blk_-1962819020391742554 from /127.0.0.1
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2246)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2416)
    [junit] at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2474)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1173)
    [junit] at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:956)
    [junit] at java.lang.Thread.run(Thread.java:595)
    {code}
    The data-node should report the error to the name-node so that the corrupted replica could be removed and replicated.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 18, '08 at 3:06a
activeJun 27, '08 at 9:04p
posts17
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Robert Chansler (JIRA): 17 posts

People

Translate

site design / logo © 2022 Grokbase