FAQ
NameNode should give client the first node in the pipeline from different rack other than that of excludedNodes list in the same rack.
---------------------------------------------------------------------------------------------------------------------------------------

Key: HDFS-1384
URL: https://issues.apache.org/jira/browse/HDFS-1384
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Thanh Do


We saw a case that NN keeps giving client nodes from the same rack, hence an exception
from client when try to setup the pipeline. Client retries 5 times and fails.

Here is more details. Support we have 2 rack
- Rack 0: from dn1 to dn7
- Rack 1: from dn8 to dn14

Client asks for 3 dns and NN replies with dn1, dn8 and dn9, for example.
Because there is network partition, so client doesn't see any node in Rack 0.
Hence, client add dn1 to excludedNodes list, and ask NN again.
Interestingly, NN picks a different node (from those in excludedNodes) in Rack 0,
and gives back to client, and so on. Client keeps retrying and after 5 times of retrials,
write fails.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
Haryadi Gunawi (haryadi@eecs.berkeley.edu)


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • dhruba borthakur (JIRA) at Sep 10, 2010 at 9:23 am
    [ https://issues.apache.org/jira/browse/HDFS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur resolved HDFS-1384.
    ------------------------------------

    Resolution: Duplicate

    This bug has been fixed in trunk because the client sends the excluded list to the namenode with the addBlock RPC. The NN ensures that it does not return a datanode from the excluded list.

    This bug is still present in the 0.20-append branch
    NameNode should give client the first node in the pipeline from different rack other than that of excludedNodes list in the same rack.
    ---------------------------------------------------------------------------------------------------------------------------------------

    Key: HDFS-1384
    URL: https://issues.apache.org/jira/browse/HDFS-1384
    Project: Hadoop HDFS
    Issue Type: Bug
    Affects Versions: 0.20-append, 0.20.1
    Reporter: Thanh Do

    We saw a case that NN keeps giving client nodes from the same rack, hence an exception
    from client when try to setup the pipeline. Client retries 5 times and fails.

    Here is more details. Support we have 2 rack
    - Rack 0: from dn1 to dn7
    - Rack 1: from dn8 to dn14
    Client asks for 3 dns and NN replies with dn1, dn8 and dn9, for example.
    Because there is network partition, so client doesn't see any node in Rack 0.
    Hence, client add dn1 to excludedNodes list, and ask NN again.
    Interestingly, NN picks a different node (from those in excludedNodes) in Rack 0,
    and gives back to client, and so on. Client keeps retrying and after 5 times of retrials,
    write fails.
    This bug was found by our Failure Testing Service framework:
    http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
    For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
    Haryadi Gunawi (haryadi@eecs.berkeley.edu)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Thanh Do (JIRA) at Sep 12, 2010 at 1:00 am
    [ https://issues.apache.org/jira/browse/HDFS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Thanh Do reopened HDFS-1384:
    ----------------------------


    Dhruba,

    I think I make a bad description of the bug.
    The excludedList does the job.
    But in this case, the excludedList contains only nodes from rack 0,
    and when client retries, NN give it the first dn in the pipe which is
    also in rack 0 too. Hence when client tries to create the pipe,
    it contacts the first dn and fails (because of network partition).
    So the problem here is that NN keeps giving client node that
    from the same rack (in this case, i.e rack 0). And because
    client cannot see any node in rack 0, it retries 5 times and fail.
    NameNode should give client the first node in the pipeline from different rack other than that of excludedNodes list in the same rack.
    ---------------------------------------------------------------------------------------------------------------------------------------

    Key: HDFS-1384
    URL: https://issues.apache.org/jira/browse/HDFS-1384
    Project: Hadoop HDFS
    Issue Type: Bug
    Affects Versions: 0.20-append, 0.20.1
    Reporter: Thanh Do

    We saw a case that NN keeps giving client nodes from the same rack, hence an exception
    from client when try to setup the pipeline. Client retries 5 times and fails.

    Here is more details. Support we have 2 rack
    - Rack 0: from dn1 to dn7
    - Rack 1: from dn8 to dn14
    Client asks for 3 dns and NN replies with dn1, dn8 and dn9, for example.
    Because there is network partition, so client doesn't see any node in Rack 0.
    Hence, client add dn1 to excludedNodes list, and ask NN again.
    Interestingly, NN picks a different node (from those in excludedNodes) in Rack 0,
    and gives back to client, and so on. Client keeps retrying and after 5 times of retrials,
    write fails.
    This bug was found by our Failure Testing Service framework:
    http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
    For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
    Haryadi Gunawi (haryadi@eecs.berkeley.edu)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-dev @
categorieshadoop
postedSep 8, '10 at 2:02a
activeSep 12, '10 at 1:00a
posts3
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Thanh Do (JIRA): 3 posts

People

Translate

site design / logo © 2022 Grokbase