FAQ
I loaded data into HDFS last week, and this morning I was greeted with this
on the web interface: "WARNING : There are about 32 missing blocks. Please
check the log or run fsck."

I ran fsck and see several missing and corrupt blocks. The output is
verbose, so here's a small sample:

/tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
CORRUPT block blk_-5745991833770623132
/tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
MISSING 1 blocks of total size 2945889 B........
/user/hive/warehouse/player_game_stat/2011-01-15/datafile: CORRUPT block
blk_1642129438978395720
/user/hive/warehouse/player_game_stat/2011-01-15/datafile: MISSING 1 blocks
of total size 67108864 B................

Sometimes the number of dots after the B is quite large (several lines
long). Some of these are tmp files, but many are important. If this cluster
were prod, I'd have some splaining to do. I need to determine what caused
this corruption.

Questions:

1. What are the dots after the B? What is the significance of the number
of them?
2. Does anyone have suggestions where to start?
3. Are there typical misconfigurations or issues that cause corruption &
missing files?
4. What is "the log" that the NameNode web interface is refers to?

Thanks for any infos! I'm... nervous. :)
--
Tim Ellis
Riot Games

Search Discussions

  • Jean-Daniel Cryans at May 18, 2011 at 12:17 am
    Hey Tim,

    It looks like you are running with only 1 replica so my first guess is
    that you only have 1 datanode and it's writing to /tmp, which was
    cleaned at some point.

    J-D
    On Tue, May 17, 2011 at 5:13 PM, Time Less wrote:
    I loaded data into HDFS last week, and this morning I was greeted with this
    on the web interface: "WARNING : There are about 32 missing blocks. Please
    check the log or run fsck."

    I ran fsck and see several missing and corrupt blocks. The output is
    verbose, so here's a small sample:

    /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
    CORRUPT block blk_-5745991833770623132
    /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
    MISSING 1 blocks of total size 2945889 B........
    /user/hive/warehouse/player_game_stat/2011-01-15/datafile: CORRUPT block
    blk_1642129438978395720
    /user/hive/warehouse/player_game_stat/2011-01-15/datafile: MISSING 1 blocks
    of total size 67108864 B................

    Sometimes the number of dots after the B is quite large (several lines
    long). Some of these are tmp files, but many are important. If this cluster
    were prod, I'd have some splaining to do. I need to determine what caused
    this corruption.

    Questions:

    What are the dots after the B? What is the significance of the number of
    them?
    Does anyone have suggestions where to start?
    Are there typical misconfigurations or issues that cause corruption &
    missing files?
    What is "the log" that the NameNode web interface is refers to?

    Thanks for any infos! I'm... nervous. :)
    --
    Tim Ellis
    Riot Games
  • Time Less at May 18, 2011 at 1:22 am
    It looks like you are running with only 1 replica so my first guess is
    that you only have 1 datanode and it's writing to /tmp, which was
    cleaned at some point.

    Hi, J-D!

    On the last iteration of setting up this cluster, I used 2 replicas and saw
    similar corruption. Thus I setup this cluster with default 3 replicas (based
    on the assumption that unusual replica values might expose unusual bugs). I
    can't find the commandline interface to get replica information for the
    file, but I was able to browse to it through the web interface, and here's
    what I see:
    Contents of directory/user/hive/warehouse/player_game_stat/2011-01-15
    ------------------------------
    <http://hadooptest5:50075/browseDirectory.jsp?dir=/user/hive/warehouse/player_game_stat&namenodeInfoPort=50070&delegation=null>
    Name Type Size Replication Block Size Modification Time Permission Owner
    Group datafile file 231.12 MB 3 64 MB 2011-05-06 21:13 rw-r--r-- hdfs
    supergroup
    I'm assuming that means the 1-replica hypothesis is incorrect.

    I'll follow up on the suggestion about the datanodes writing into /tmp. I
    had a similar problem with the prior iteration of this cluster (dfs.name.dir
    wasn't defined, and so NameNode metadata(?) was going into /tmp).

    I now have a metaquestion: is there a default Hadoop configuration out there
    somewhere that has all critical parameters at least listed, if not filled
    out with some sane defaults? I keep discovering undefined parameters via
    unusual and difficult-to-troubleshoot cluster behaviour.

    --
    Tim Ellis
    Riot Games
  • Thanh Do at May 18, 2011 at 1:32 am
    what version you are using?
    On Tue, May 17, 2011 at 8:22 PM, Time Less wrote:

    It looks like you are running with only 1 replica so my first guess is
    that you only have 1 datanode and it's writing to /tmp, which was
    cleaned at some point.

    Hi, J-D!

    On the last iteration of setting up this cluster, I used 2 replicas and saw
    similar corruption. Thus I setup this cluster with default 3 replicas (based
    on the assumption that unusual replica values might expose unusual bugs). I
    can't find the commandline interface to get replica information for the
    file, but I was able to browse to it through the web interface, and here's
    what I see:
    Contents of directory/user/hive/warehouse/player_game_stat/2011-01-15
    ------------------------------

    <http://hadooptest5:50075/browseDirectory.jsp?dir=/user/hive/warehouse/player_game_stat&namenodeInfoPort=50070&delegation=null>
    Name Type Size Replication Block Size Modification Time Permission Owner
    Group datafile file 231.12 MB 3 64 MB 2011-05-06 21:13 rw-r--r-- hdfs
    supergroup
    I'm assuming that means the 1-replica hypothesis is incorrect.

    I'll follow up on the suggestion about the datanodes writing into /tmp. I
    had a similar problem with the prior iteration of this cluster (dfs.name.dir
    wasn't defined, and so NameNode metadata(?) was going into /tmp).

    I now have a metaquestion: is there a default Hadoop configuration out
    there somewhere that has all critical parameters at least listed, if not
    filled out with some sane defaults? I keep discovering undefined parameters
    via unusual and difficult-to-troubleshoot cluster behaviour.

    --
    Tim Ellis
    Riot Games
  • Will Maier at May 18, 2011 at 10:36 am

    On Tue, May 17, 2011 at 06:22:00PM -0700, Time Less wrote:
    I now have a metaquestion: is there a default Hadoop configuration out there
    somewhere that has all critical parameters at least listed, if not filled out
    with some sane defaults? I keep discovering undefined parameters via unusual
    and difficult-to-troubleshoot cluster behaviour.
    It's not quite what you're asking for, but your NameNode's web interface should
    provide a merged dump of all the relevant config settings, including comments
    indicating the name of the config file where the setting was defined, at the
    /conf path.

    --

    Will Maier - UW High Energy Physics
    cel: 608.438.6162
    tel: 608.263.9692
    web: http://www.hep.wisc.edu/~wcmaier/
  • Time Less at May 18, 2011 at 11:42 pm
    Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea? I'd
    rather, in order of preference, have the following behaviours if dfs.*.dir
    are undefined:

    1. Daemons log errors and fail to start at all,
    2. Daemons start but default to /var/db/hadoop (or any persistent
    location), meanwhile logging in huge screaming all-caps letters that it's
    picked a default which may not be optimal,
    3. Daemons start botnet and DDOS random government websites, wait 36
    hours, then phone the FBI and blame administrator for it*,
    4. Daemons write "persistent" data into /tmp without any great fanfare,
    allowing a sense of complacency in its victims, only to report at a random
    time in the future that everything is corrupted beyond repair, ie current
    behaviour.

    I submitted a JIRA (which appears to have been resolved, yay!) to at least
    add verbiage to the WARNING letting you know why you've irreversibly
    corrupted your cluster, but it does feel somewhat dissatisfying, since by
    the time you see the WARNING your cluster is already useless/dead.

    It's not quite what you're asking for, but your NameNode's web interface
    should
    provide a merged dump of all the relevant config settings, including
    comments
    indicating the name of the config file where the setting was defined, at
    the
    /conf path.
    Cool, though it looks like that's just the NameNode's config, right? Not the
    DataNode's config, which is the component corrupting data due to this
    default?

    --
    Tim Ellis
    Riot Games
    * Hello, FBI, #3 was a joke. I wish #4 was a joke, too.
  • Aaron Eng at May 18, 2011 at 11:54 pm
    Hey Tim,

    Hope everything is good with you. Looks like you're having some fun with
    hadoop.
    Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea?
    It's not a good idea, its just how it defaults. You'll find hundreds or
    probably thousands of these quirks as you work with Apache/Cloudera hadoop
    distributions. Never trust the defaults.
    submitted a JIRA
    That's the way to do it.
    which appears to have been resolved ... but it does feel somewhat
    dissatisfying, since by the time you see the WARNING your cluster is already
    useless/dead.
    And that's why, if it's relevant to you, you're best bet is to resolve the
    JIRA yourself. Most of the contributors are big picture types who would
    look at "small" usability issues like this and scoff about "newbies". Of
    course, by the time you're familiar enough with Hadoop and comfortable
    enough to fix your own JIRA's, you might also join the ranks of jaded
    contributor who scoffs ad usability issues logged by newbies.

    Case in point, I noted a while ago that when you run the namenode -format
    command, it only accepts a capital Y (or lower case, can't remember), and it
    fails silently if you give the wrong case. I didn't particularly care
    enough to fix it, having already learned my lesson. You'll find lots of
    these rough edges through hadoop, it is not a user firendly, out-of-the-box
    enterprise-ready product.


    On Wed, May 18, 2011 at 4:41 PM, Time Less wrote:

    Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea? I'd
    rather, in order of preference, have the following behaviours if dfs.*.dir
    are undefined:

    1. Daemons log errors and fail to start at all,
    2. Daemons start but default to /var/db/hadoop (or any persistent
    location), meanwhile logging in huge screaming all-caps letters that it's
    picked a default which may not be optimal,
    3. Daemons start botnet and DDOS random government websites, wait 36
    hours, then phone the FBI and blame administrator for it*,
    4. Daemons write "persistent" data into /tmp without any great fanfare,
    allowing a sense of complacency in its victims, only to report at a random
    time in the future that everything is corrupted beyond repair, ie current
    behaviour.

    I submitted a JIRA (which appears to have been resolved, yay!) to at least
    add verbiage to the WARNING letting you know why you've irreversibly
    corrupted your cluster, but it does feel somewhat dissatisfying, since by
    the time you see the WARNING your cluster is already useless/dead.

    It's not quite what you're asking for, but your NameNode's web interface
    should
    provide a merged dump of all the relevant config settings, including
    comments
    indicating the name of the config file where the setting was defined, at
    the
    /conf path.
    Cool, though it looks like that's just the NameNode's config, right? Not
    the DataNode's config, which is the component corrupting data due to this
    default?

    --
    Tim Ellis
    Riot Games
    * Hello, FBI, #3 was a joke. I wish #4 was a joke, too.
  • Aaron Eng at May 18, 2011 at 11:55 pm
    Most of the contributors are big picture types who would look at "small"
    usability issues like this and scoff about "newbies".
    P.S. This is speaking from the newbie perspective, it was not meant as a
    slight to contributors in any way. Just a comment on the steep learning
    curve of picking up Hadoop.

    On Wed, May 18, 2011 at 4:54 PM, Aaron Eng wrote:

    Hey Tim,

    Hope everything is good with you. Looks like you're having some fun with
    hadoop.
    Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea?
    It's not a good idea, its just how it defaults. You'll find hundreds or
    probably thousands of these quirks as you work with Apache/Cloudera hadoop
    distributions. Never trust the defaults.
    submitted a JIRA
    That's the way to do it.
    which appears to have been resolved ... but it does feel somewhat
    dissatisfying, since by the time you see the WARNING your cluster is already
    useless/dead.
    And that's why, if it's relevant to you, you're best bet is to resolve the
    JIRA yourself. Most of the contributors are big picture types who would
    look at "small" usability issues like this and scoff about "newbies". Of
    course, by the time you're familiar enough with Hadoop and comfortable
    enough to fix your own JIRA's, you might also join the ranks of jaded
    contributor who scoffs ad usability issues logged by newbies.

    Case in point, I noted a while ago that when you run the namenode -format
    command, it only accepts a capital Y (or lower case, can't remember), and it
    fails silently if you give the wrong case. I didn't particularly care
    enough to fix it, having already learned my lesson. You'll find lots of
    these rough edges through hadoop, it is not a user firendly, out-of-the-box
    enterprise-ready product.



    On Wed, May 18, 2011 at 4:41 PM, Time Less wrote:

    Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea? I'd
    rather, in order of preference, have the following behaviours if dfs.*.dir
    are undefined:

    1. Daemons log errors and fail to start at all,
    2. Daemons start but default to /var/db/hadoop (or any persistent
    location), meanwhile logging in huge screaming all-caps letters that it's
    picked a default which may not be optimal,
    3. Daemons start botnet and DDOS random government websites, wait 36
    hours, then phone the FBI and blame administrator for it*,
    4. Daemons write "persistent" data into /tmp without any great
    fanfare, allowing a sense of complacency in its victims, only to report at a
    random time in the future that everything is corrupted beyond repair, ie
    current behaviour.

    I submitted a JIRA (which appears to have been resolved, yay!) to at least
    add verbiage to the WARNING letting you know why you've irreversibly
    corrupted your cluster, but it does feel somewhat dissatisfying, since by
    the time you see the WARNING your cluster is already useless/dead.

    It's not quite what you're asking for, but your NameNode's web interface
    should
    provide a merged dump of all the relevant config settings, including
    comments
    indicating the name of the config file where the setting was defined, at
    the
    /conf path.
    Cool, though it looks like that's just the NameNode's config, right? Not
    the DataNode's config, which is the component corrupting data due to this
    default?

    --
    Tim Ellis
    Riot Games
    * Hello, FBI, #3 was a joke. I wish #4 was a joke, too.
  • Todd Lipcon at May 19, 2011 at 12:09 am

    On Wed, May 18, 2011 at 4:55 PM, Aaron Eng wrote:
    Most of the contributors are big picture types who would look at "small"
    usability issues like this and scoff about "newbies".
    P.S. This is speaking from the newbie perspective, it was not meant as a
    slight to contributors in any way.  Just a comment on the steep learning
    curve of picking up Hadoop.
    Hi Aaron,

    I'm sorry you feel this way about the Hadoop contributors. It's
    definitely a mistake we've made in the past but are trying to do our
    best to improve things. The last two Wednesdays we have held
    hackathons at the Cloudera offices and gotten lots of new people on
    board working mostly on small fixes like this.

    If you have some specific issues you'd like to point out, please file
    JIRAs. I'll be sure to take a look.

    -Todd
    On Wed, May 18, 2011 at 4:54 PM, Aaron Eng wrote:

    Hey Tim,

    Hope everything is good with you.  Looks like you're having some fun with
    hadoop.
    Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea?
    It's not a good idea, its just how it defaults.  You'll find hundreds or
    probably thousands of these quirks as you work with Apache/Cloudera hadoop
    distributions.  Never trust the defaults.
    submitted a JIRA
    That's the way to do it.
    which appears to have been resolved ... but it does feel somewhat
    dissatisfying, since by the time you see the WARNING your cluster is already
    useless/dead.
    And that's why, if it's relevant to you, you're best bet is to resolve the
    JIRA yourself.  Most of the contributors are big picture types who would
    look at "small" usability issues like this and scoff about "newbies".  Of
    course, by the time you're familiar enough with Hadoop and comfortable
    enough to fix your own JIRA's, you might also join the ranks of jaded
    contributor who scoffs ad usability issues logged by newbies.

    Case in point, I noted a while ago that when you run the namenode -format
    command, it only accepts a capital Y (or lower case, can't remember), and it
    fails silently if you give the wrong case.  I didn't particularly care
    enough to fix it, having already learned my lesson.  You'll find lots of
    these rough edges through hadoop, it is not a user firendly, out-of-the-box
    enterprise-ready product.


    On Wed, May 18, 2011 at 4:41 PM, Time Less wrote:

    Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea?
    I'd rather, in order of preference, have the following behaviours if
    dfs.*.dir are undefined:

    Daemons log errors and fail to start at all,
    Daemons start but default to /var/db/hadoop (or any persistent location),
    meanwhile logging in huge screaming all-caps letters that it's picked a
    default which may not be optimal,
    Daemons start botnet and DDOS random government websites, wait 36 hours,
    then phone the FBI and blame administrator for it*,
    Daemons write "persistent" data into /tmp without any great fanfare,
    allowing a sense of complacency in its victims, only to report at a random
    time in the future that everything is corrupted beyond repair, ie current
    behaviour.

    I submitted a JIRA (which appears to have been resolved, yay!) to at
    least add verbiage to the WARNING letting you know why you've irreversibly
    corrupted your cluster, but it does feel somewhat dissatisfying, since by
    the time you see the WARNING your cluster is already useless/dead.
    It's not quite what you're asking for, but your NameNode's web interface
    should
    provide a merged dump of all the relevant config settings, including
    comments
    indicating the name of the config file where the setting was defined, at
    the
    /conf path.
    Cool, though it looks like that's just the NameNode's config, right? Not
    the DataNode's config, which is the component corrupting data due to this
    default?

    --
    Tim Ellis
    Riot Games
    * Hello, FBI, #3 was a joke. I wish #4 was a joke, too.


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Time Less at May 19, 2011 at 1:30 am

    If you have some specific issues you'd like to point out, please file
    JIRAs. I'll be sure to take a look.
    If others would like to comment, see HDFS-1960: dfs.*.dir should not default
    to /tmp (or other typically volatile storage).

    I won't speak to other usability issues like NameNode format failing
    silently (which bit me, too), because they merely slow me down. This one has
    enormous negative consequences, and "fails unsafe," so I'm following it up.

    When I told everyone why the dev cluster died, there were a lot of chins on
    the floor, so keep in mind that about 5 veteran database guys are agreeing
    with me on this particular JIRA.

    --
    Tim Ellis
    Riot Games
  • Jonathan Disher at May 19, 2011 at 2:46 am

    On May 18, 2011, at 4:54 PM, Aaron Eng wrote:
    Case in point, I noted a while ago that when you run the namenode -format command, it only accepts a capital Y (or lower case, can't remember), and it fails silently if you give the wrong case. I didn't particularly care enough to fix it, having already learned my lesson. You'll find lots of these rough edges through hadoop, it is not a user firendly, out-of-the-box enterprise-ready product.

    It's capital.

    You know, when I built my Archive cluster, I fought this for two hours, figuring something was hosed in my configuration, before I finally figured it out.

    It's kind of an embarrassing bug, first-semester Java students know how to work around it (believe me, I've only taken one semester of Java!)

    -j
  • Todd Lipcon at May 19, 2011 at 2:52 am
    I've filed a JIRA and patch for this issue:
    https://issues.apache.org/jira/browse/HDFS-1958

    Thanks for the feedback, all.

    -Todd
    On Wed, May 18, 2011 at 7:46 PM, Jonathan Disher wrote:
    On May 18, 2011, at 4:54 PM, Aaron Eng wrote:
    Case in point, I noted a while ago that when you run the namenode -format command, it only accepts a capital Y (or lower case, can't remember), and it fails silently if you give the wrong case.  I didn't particularly care enough to fix it, having already learned my lesson.  You'll find lots of these rough edges through hadoop, it is not a user firendly, out-of-the-box enterprise-ready product.

    It's capital.

    You know, when I built my Archive cluster, I fought this for two hours, figuring something was hosed in my configuration, before I finally figured it out.

    It's kind of an embarrassing bug, first-semester Java students know how to work around it (believe me, I've only taken one semester of Java!)

    -j


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Todd Lipcon at May 19, 2011 at 7:37 am

    On Wed, May 18, 2011 at 7:51 PM, Todd Lipcon wrote:
    I've filed a JIRA and patch for this issue:
    https://issues.apache.org/jira/browse/HDFS-1958
    And it's now committed.

    -Todd
    On Wed, May 18, 2011 at 7:46 PM, Jonathan Disher wrote:
    On May 18, 2011, at 4:54 PM, Aaron Eng wrote:
    Case in point, I noted a while ago that when you run the namenode -format command, it only accepts a capital Y (or lower case, can't remember), and it fails silently if you give the wrong case.  I didn't particularly care enough to fix it, having already learned my lesson.  You'll find lots of these rough edges through hadoop, it is not a user firendly, out-of-the-box enterprise-ready product.

    It's capital.

    You know, when I built my Archive cluster, I fought this for two hours, figuring something was hosed in my configuration, before I finally figured it out.

    It's kind of an embarrassing bug, first-semester Java students know how to work around it (believe me, I've only taken one semester of Java!)

    -j


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Time Less at May 18, 2011 at 1:45 am
    The answer is dfs.data.dir wasn't defined, and indeed the data was being
    stored in /tmp. Corruption ensues. I've found a page:
    http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html that seems
    to have a good number of the parameters that should be defined.

    On Tue, May 17, 2011 at 5:16 PM, Jean-Daniel Cryans wrote:

    Hey Tim,

    It looks like you are running with only 1 replica so my first guess is
    that you only have 1 datanode and it's writing to /tmp, which was
    cleaned at some point.

    J-D
    On Tue, May 17, 2011 at 5:13 PM, Time Less wrote:
    I loaded data into HDFS last week, and this morning I was greeted with this
    on the web interface: "WARNING : There are about 32 missing blocks. Please
    check the log or run fsck."

    I ran fsck and see several missing and corrupt blocks. The output is
    verbose, so here's a small sample:

    /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
    CORRUPT block blk_-5745991833770623132
    /tmp/hadoop-mapred/mapred/staging/hdfs/.staging/job_201104081532_0507/job.jar:
    MISSING 1 blocks of total size 2945889 B........
    /user/hive/warehouse/player_game_stat/2011-01-15/datafile: CORRUPT block
    blk_1642129438978395720
    /user/hive/warehouse/player_game_stat/2011-01-15/datafile: MISSING 1 blocks
    of total size 67108864 B................

    Sometimes the number of dots after the B is quite large (several lines
    long). Some of these are tmp files, but many are important. If this cluster
    were prod, I'd have some splaining to do. I need to determine what caused
    this corruption.

    Questions:

    What are the dots after the B? What is the significance of the number of
    them?
    Does anyone have suggestions where to start?
    Are there typical misconfigurations or issues that cause corruption &
    missing files?
    What is "the log" that the NameNode web interface is refers to?

    Thanks for any infos! I'm... nervous. :)
    --
    Tim Ellis
    Riot Games


    --
    Tim
  • Jain, Prem at May 27, 2011 at 5:02 am
    One of my datanodes got hung and I had to reboot and restart the
    datanode & tasktracker. However the datanode fails to start with the
    following error message.

    Thanks in advance.



    [root@hadoop20 init.d]# ./hadoop-0.20-datanode start
    Starting Hadoop datanode daemon (hadoop-datanode): starting datanode,
    logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-hadoop20.out
    datanode dead but pid file exists [ OK ]
    [root@hadoop20 init.d]#
  • Harsh J at May 27, 2011 at 5:45 am
    That error does not seem helpful enough for us to figure out why the
    DN does not start up. What do the DN's logs say?
    On Fri, May 27, 2011 at 10:31 AM, Jain, Prem wrote:
    One of my datanodes got hung and I had to reboot and restart the
    datanode & tasktracker. However the datanode fails to start with the
    following error message.

    Thanks in advance.



    [root@hadoop20 init.d]# ./hadoop-0.20-datanode start
    Starting Hadoop datanode daemon (hadoop-datanode): starting datanode,
    logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-hadoop20.out
    datanode dead but pid file exists                          [  OK  ]
    [root@hadoop20 init.d]#


    --
    Harsh J
  • Stuti Awasthi at May 27, 2011 at 6:19 am
    Hi Prem,
    Try to remove the pid file and again try to start datanode and tasktracker. If this not work please post the logs here.

    -----Original Message-----
    From: Jain, Prem
    Sent: Friday, May 27, 2011 10:32 AM
    To: hdfs-user@hadoop.apache.org
    Subject: Can't start datanode?

    One of my datanodes got hung and I had to reboot and restart the
    datanode & tasktracker. However the datanode fails to start with the
    following error message.

    Thanks in advance.



    [root@hadoop20 init.d]# ./hadoop-0.20-datanode start
    Starting Hadoop datanode daemon (hadoop-datanode): starting datanode,
    logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-hadoop20.out
    datanode dead but pid file exists [ OK ]
    [root@hadoop20 init.d]#

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
  • Stuti Awasthi at May 30, 2011 at 5:45 am
    Keeping alias in the loop

    -----Original Message-----
    From: Stuti Awasthi
    Sent: Monday, May 30, 2011 10:56 AM
    To: 'Jain, Prem'
    Subject: RE: Can't start datanode?

    Hi Prem,

    The datanode pid file name is "hadoop-[USERNAME]-datanode.pid" and by default it is present at location /tmp directory.
    Here username is the user with which you start your daemon process.


    -----Original Message-----
    From: Jain, Prem
    Sent: Friday, May 27, 2011 7:29 PM
    To: Stuti Awasthi
    Subject: RE: Can't start datanode?

    Sorry, I am a novice in this Hadoop land. Where do I find the piD file ?

    -----Original Message-----
    From: Stuti Awasthi
    Sent: Thursday, May 26, 2011 11:19 PM
    To: hdfs-user@hadoop.apache.org
    Subject: RE: Can't start datanode?

    Hi Prem,
    Try to remove the pid file and again try to start datanode and
    tasktracker. If this not work please post the logs here.

    -----Original Message-----
    From: Jain, Prem
    Sent: Friday, May 27, 2011 10:32 AM
    To: hdfs-user@hadoop.apache.org
    Subject: Can't start datanode?

    One of my datanodes got hung and I had to reboot and restart the
    datanode & tasktracker. However the datanode fails to start with the
    following error message.

    Thanks in advance.



    [root@hadoop20 init.d]# ./hadoop-0.20-datanode start
    Starting Hadoop datanode daemon (hadoop-datanode): starting datanode,
    logging to /usr/lib/hadoop-0.20/logs/hadoop-hadoop-datanode-hadoop20.out
    datanode dead but pid file exists [ OK ]
    [root@hadoop20 init.d]#

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use
    of the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication
    in error, please notify the sender and delete all copies of this
    message. Persistent Systems Ltd. does not accept any liability for virus
    infected mails.


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedMay 18, '11 at 12:13a
activeMay 30, '11 at 5:45a
posts18
users10
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase