FAQ
We've been seeing problems with our HDFS since updating to CDH4. We were
running 4.1.2 at first, but I just updated to 4.1.3 and still seeing the
same issues. Hadoop configs and server configuration between the cdh3 and
cdh4 are largely the same, except for the new cdh4 specific configs (such
as HA).

Problem 1: Randomly dying DataNodes. I tuned the log level to DEBUG and
still not getting any useful information on why something died. See [1]
below for output of log.

Problem 2: DataNodes not consistently reporting in to all NameNodes. Right
now I have 2 namenodes, using the HA configuration. Both show 2 dead
nodes, but the nodes are not all the same nodes. A recently dead node
(from problem 1) shows up in both lists, but each namenode also thinks a
different perfectly healthy node is also dead. If I check the logs on the
2 different "dead" nodes, they both appear to be running, but I only see
heartbeat log lines to one namenode. I don't see any exceptions, the
heartbeats to the other namenode just aren't there.


Any info on either of these issues?

[1] I see a string of these exceptions, interspersed by plenty of seemingly
successful log lines:

http://pastebin.com/ZvigjXvT

Then the datanode just stops:

http://pastebin.com/HLHe4CsX

--

Search Discussions

  • Bryan Beaudreault at Feb 14, 2013 at 7:29 pm
    For problem 2 I should mention that on a full cluster restart all datanodes
    check in to both namenodes, it only happens later they stop checking in

    On Thu, Feb 14, 2013 at 2:12 PM, Bryan Beaudreault wrote:

    We've been seeing problems with our HDFS since updating to CDH4. We were
    running 4.1.2 at first, but I just updated to 4.1.3 and still seeing the
    same issues. Hadoop configs and server configuration between the cdh3 and
    cdh4 are largely the same, except for the new cdh4 specific configs (such
    as HA).

    Problem 1: Randomly dying DataNodes. I tuned the log level to DEBUG and
    still not getting any useful information on why something died. See [1]
    below for output of log.

    Problem 2: DataNodes not consistently reporting in to all NameNodes.
    Right now I have 2 namenodes, using the HA configuration. Both show 2
    dead nodes, but the nodes are not all the same nodes. A recently dead node
    (from problem 1) shows up in both lists, but each namenode also thinks a
    different perfectly healthy node is also dead. If I check the logs on the
    2 different "dead" nodes, they both appear to be running, but I only see
    heartbeat log lines to one namenode. I don't see any exceptions, the
    heartbeats to the other namenode just aren't there.


    Any info on either of these issues?

    [1] I see a string of these exceptions, interspersed by plenty of
    seemingly successful log lines:

    http://pastebin.com/ZvigjXvT

    Then the datanode just stops:

    http://pastebin.com/HLHe4CsX
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedFeb 14, '13 at 7:13p
activeFeb 14, '13 at 7:29p
posts2
users1
websitecloudera.com
irc#hadoop

1 user in discussion

Bryan Beaudreault: 2 posts

People

Translate

site design / logo © 2022 Grokbase