FAQ
Hi,
I'm on CDH4 (4.0.3), which basically implies the latest version of Hadoop
(CDH4 - 2.0.0+545), and we have a setup where the NN data directory is on
two different set of disks (D1 and D2). Every now and then NN status
health check shows the D2 is failed and that's b/c there isn't enough space
for the image edits to be created. When we restart the NN, we always seem
to get an error of "FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
Exception in namenode join" .. which relates to some invalid pointed in the
images. When we decided to remove NN writing to D2 in the config, it
seemed that the NN came up w/o any issues. So my question is this, just
b/c D2 had failures due to space issues, shouldn't the NN still come up,
w/o those join errors b/c of D2? B/c if that's not the case, what is the
purpose then to create multiple data directories for NN?

--

Search Discussions

  • Todd Lipcon at Oct 22, 2012 at 3:54 pm
    Hi Anson

    Can you please paste the exact message you get when starting the NN, in
    addition to the output of 'ls -l /path/to/namedir/that/causes/the/problem' ?

    If it runs out of space while saving the image it should have left the
    image as a '.ckpt' file, which wouldn't impact startup, but there may be
    some bug here.

    Thanks
    -Todd
    On Mon, Oct 22, 2012 at 8:43 AM, ansonism wrote:

    Hi,
    I'm on CDH4 (4.0.3), which basically implies the latest version of Hadoop
    (CDH4 - 2.0.0+545), and we have a setup where the NN data directory is on
    two different set of disks (D1 and D2). Every now and then NN status
    health check shows the D2 is failed and that's b/c there isn't enough space
    for the image edits to be created. When we restart the NN, we always seem
    to get an error of "FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
    Exception in namenode join" .. which relates to some invalid pointed in the
    images. When we decided to remove NN writing to D2 in the config, it
    seemed that the NN came up w/o any issues. So my question is this, just
    b/c D2 had failures due to space issues, shouldn't the NN still come up,
    w/o those join errors b/c of D2? B/c if that's not the case, what is the
    purpose then to create multiple data directories for NN?

    --




    --
    Todd Lipcon
    Software Engineer, Cloudera

    --
  • Ansonism at Oct 22, 2012 at 5:19 pm
    The checkpoint files are there on D2 but they're outdated. Where the last
    one was oct4th. We didn't see the error referring to that 2nd directory.
    We just happened to have changed the config to just use the one directory
    (the good one) and it worked. Looking for the log error when we started up
    the NN.
    On Monday, October 22, 2012 11:54:40 AM UTC-4, Todd Lipcon wrote:

    Hi Anson

    Can you please paste the exact message you get when starting the NN, in
    addition to the output of 'ls -l /path/to/namedir/that/causes/the/problem' ?

    If it runs out of space while saving the image it should have left the
    image as a '.ckpt' file, which wouldn't impact startup, but there may be
    some bug here.

    Thanks
    -Todd

    On Mon, Oct 22, 2012 at 8:43 AM, ansonism <anson....@gmail.com<javascript:>
    wrote:
    Hi,
    I'm on CDH4 (4.0.3), which basically implies the latest version of Hadoop
    (CDH4 - 2.0.0+545), and we have a setup where the NN data directory is on
    two different set of disks (D1 and D2). Every now and then NN status
    health check shows the D2 is failed and that's b/c there isn't enough space
    for the image edits to be created. When we restart the NN, we always seem
    to get an error of "FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
    Exception in namenode join" .. which relates to some invalid pointed in the
    images. When we decided to remove NN writing to D2 in the config, it
    seemed that the NN came up w/o any issues. So my question is this, just
    b/c D2 had failures due to space issues, shouldn't the NN still come up,
    w/o those join errors b/c of D2? B/c if that's not the case, what is the
    purpose then to create multiple data directories for NN?

    --




    --
    Todd Lipcon
    Software Engineer, Cloudera
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedOct 22, '12 at 3:43p
activeOct 22, '12 at 5:19p
posts3
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Ansonism: 2 posts Todd Lipcon: 1 post

People

Translate

site design / logo © 2018 Grokbase