Exceptions in DataXceiver#run can result in a zombie datanode

Key: HDFS-2182
URL: https://issues.apache.org/jira/browse/HDFS-2182
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node
Reporter: Eli Collins
Fix For: 0.23.0

DataXceiver#run currently swallows all exceptions, it should instead plumb them up to DataXceiverServer#run so it can decide whether the exception should be tolerated or the daemon should exit. An IOE should be tolerated (because it's likely just an issue with a particular thread, or an intermittent failure), as it is today, but eg j.l.Error should be not.

This came up in the following bug I'm seeing on a test cluster: if there's eg a NoClassDefFoundError thrown in DataXceiver#run (because the host jars were replaced out from underneath it, it ran out of descriptors, etc.) we'll end up with a datanode that is alive but always fails because it can't create any DataXceiver threads. In this case the datanode should shut itself down rather than continue to run.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
grouphdfs-dev @
postedJul 21, '11 at 8:15p
activeJul 21, '11 at 8:15p

1 user in discussion

Eli Collins (JIRA): 1 post



site design / logo © 2022 Grokbase