FAQ
Looking at the codebase it seems to suggest that it ignores a editlog
storage directory if it encounters an error

http://www.google.com/codesearch/p?hl=en#GLh8vwsjDqs/trunk/src/hdfs/org/apac
he/hadoop/hdfs/server/namenode/FSEditLog.java&q=namenode%20editlog&sa=N&cd=2
0&ct=rc

Check lines:
Code in line 334
comment: 387 - 390
comment: 411 - 414
Comment: 433 - 436

The processIOError method is called throughout the code if it encounters an
IOException.

A fatal error is only thrown if none of the storage directories is
accessible. Lines 394, 420

- Sudhir



On Aug/23/ 2:21 PM, "common-user-digest-help@hadoop.apache.org"
wrote:
From: Michael Segel <michael_segel@hotmail.com>
Date: Mon, 23 Aug 2010 14:05:05 -0500
To: <common-user@hadoop.apache.org>
Subject: RE: what will happen if a backup name node folder becomes
unaccessible?


Ok...

Now you have me confused.
Everything we've seen says that writing to both a local disk and to an NFS
mounted disk would be the best way to prevent a problem.

Now you and Harsh J say that this could actually be problematic.

Which is it?
Is this now a defect that should be addressed, or should we just not use an
NFS mounted drive?

Thx

-Mike

Date: Mon, 23 Aug 2010 11:42:59 -0700
From: licht_jiang@yahoo.com
Subject: Re: what will happen if a backup name node folder becomes
unaccessible?
To: common-user@hadoop.apache.org

This makes a good argument. Actually, after seeing the previous reply, I
kindof convinced that I should go back to "sync" the meta data to a backup
location instead of using this feature, which as David mentioned, introduced
a 2nd single point of failure to hadoop, which degrades the availability of
hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can someone
confirm whether a name node will shut down given that a backup folder listed
in "dfs.name.dir" becomes unavailable in this version?

Thanks,

Michael

--- On Sun, 8/22/10, David B. Ritch wrote:

From: David B. Ritch <david.ritch@gmail.com>
Subject: Re: what will happen if a backup name node folder becomes
unaccessible?
To: common-user@hadoop.apache.org
Date: Sunday, August 22, 2010, 11:34 PM

Which version of Hadoop was this? The folks at Cloudera have assured
me that the namenode in CDH2 will continue as long as one of the
directories is still writable.

It *does* seem a bit of a waste if an availability feature - the ability
to write to multiple directories - actually reduces availability by
providing an additional single point of failure.

Thanks!

dbr
On 8/20/2010 5:27 PM, Harsh J wrote:
Whee, lets try it out:

Start with both paths available. ... Starts fine.
Store some files. ... Works.
rm -r the second path. ... Ouch.
Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
stuff back yet]
Wait for checkpoint to hit.
And ...
Boom!

2010-08-21 02:42:00,385 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
from 127.0.0.1
2010-08-21 02:42:00,385 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 37 Total time for transactions(ms): 6Number of
transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
307 277
2010-08-21 02:42:00,439 FATAL
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
storage directories are inaccessible.
2010-08-21 02:42:00,440 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

So yes, as Edward says - never let this happen!
On Sat, Aug 21, 2010 at 2:26 AM, jiang licht wrote:
Using nfs folder to back up dfs meta information as follows,

<property>
<name>dfs.name.dir</name>
<value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
</property>

where /hadoop-backup is on a backup machine and mounted on the master node.

I have a question: if somehow, the backup folder becomes unavailable, will
it freeze master node? That is, will write operation simply hang up on this
condition on the master node? Or will master node log the problem and
continues to work?

Thanks,

Michael




iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Search Discussions

  • Harsh J at Aug 24, 2010 at 6:12 am
    Hello Sudhir,

    You're right about this, but I don't seem to be getting the warning for the
    edit log IOException at all in the first place. Here's my steps to get to
    what I described earlier (note that am just using two directories on the
    same disk, not two different devices or nfs, etc.) Its my personal computer
    so I don't mind doing this again for now (as the other directory remains
    untouched).

    *hadoop 11:13:00 ~/.hadoop $* jps

    4954 SecondaryNameNode

    5911 Jps

    5158 TaskTracker

    4592 NameNode

    5650 JobTracker

    4768 DataNode

    *hadoop 11:13:02 ~/.hadoop $* hadoop dfs -ls

    Found 2 items

    -rw-r--r-- 1 hadoop supergroup 411536 2010-08-18 15:50
    /user/hadoop/data
    drwxr-xr-x - hadoop supergroup 0 2010-08-18 16:02
    /user/hadoop/dataout
    hadoop 11:13:07 ~/.hadoop $ tail -n 10 conf/hdfs-site.xml

    <property>

    <name>*dfs.name.dir*</name>

    <value>/home/hadoop/.dfs/name,*/home/hadoop/.dfs/testdir*</value>

    <final>true</final>

    </property>

    <property>

    <name>dfs.datanode.max.xcievers</name>

    <value>2047</value>

    </property>

    </configuration>

    *hadoop 11:13:25 ~/.hadoop $* ls ~/.dfs/

    data name testdir

    *hadoop 11:13:36 ~/.hadoop $ rm -r ~/.dfs/testdir *

    *hadoop 11:13:49 ~/.hadoop $* jps

    6135 Jps

    4954 SecondaryNameNode

    5158 TaskTracker

    4592 NameNode

    5650 JobTracker

    4768 DataNode

    *hadoop 11:13:56 ~/.hadoop $* hadoop dfs -put /etc/profile profile1

    *hadoop 11:14:10 ~/.hadoop $* hadoop dfs -put /etc/profile profile2

    *hadoop 11:14:12 ~/.hadoop $* hadoop dfs -put /etc/profile profile3

    *hadoop 11:14:15 ~/.hadoop $* hadoop dfs -put /etc/profile profile4


    *hadoop 11:17:21 ~/.hadoop $* jps
    4954 SecondaryNameNode

    5158 TaskTracker

    4592 NameNode

    5650 JobTracker

    4768 DataNode

    6954 Jps

    *hadoop 11:17:23 ~/.hadoop $* tail -f
    hadoop-0.20.2/logs/hadoop-hadoop-namenode-hadoop.log
    2010-08-24 11:14:17,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.allocateBlock: /user/hadoop/profile4. blk_28644972299224370_1019

    2010-08-24 11:14:17,709 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.addStoredBlock: blockMap updated: 192.168.1.8:50010 is added to
    blk_28644972299224370_1019 size 497
    2010-08-24 11:14:17,713 INFO org.apache.hadoop.hdfs.StateChange: DIR*
    NameSystem.completeFile: file /user/hadoop/profile4 is closed by
    DFSClient_-2054565417
    2010-08-24 11:17:31,187 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
    192.168.1.8

    2010-08-24 11:17:31,187 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
    19 Total time for transactions(ms): 4Number of transactions batched in
    Syncs: 0 Number of syncs: 14 SyncTimes(ms): 183 174

    2010-08-24 11:17:31,281 FATAL
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
    storage directories are inaccessible.

    2010-08-24 11:17:31,283 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************

    SHUTDOWN_MSG: Shutting down NameNode at hadoop.cf.net/127.0.0.1

    ************************************************************/

    ^C
    *hadoop 11:17:51 ~/.hadoop $* ls /home/hadoop/.dfs/

    data name
    *hadoop 11:21:14 ~/.hadoop $* jps
    8259 Jps

    4954 SecondaryNameNode

    5158 TaskTracker

    5650 JobTracker

    4768 DataNode
    *hadoop 11:36:03 ~/.hadoop $* mkdir ~/.dfs/testdir
    *hadoop 11:36:04 ~/.hadoop $ *stop-all.sh
    stopping jobtracker

    localhost: stopping tasktracker

    no namenode to stop

    localhost: stopping datanode

    localhost: stopping secondarynamenode
    *hadoop 11:37:01 ~/.hadoop $ *start-all.sh
    starting namenode, logging to
    /home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-hadoop.out


    localhost: starting datanode, logging to
    /home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-hadoop.out

    localhost: starting secondarynamenode, logging to
    /home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop.out

    starting jobtracker, logging to
    /home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-hadoop.out


    localhost: starting tasktracker, logging to
    /home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-hadoop.out
    *hadoop 11:39:30 ~/.hadoop $* hadoop dfs -ls
    Found 6 items

    -rw-r--r-- 1 hadoop supergroup 411536 2010-08-18 15:50
    /user/hadoop/data
    drwxr-xr-x - hadoop supergroup 0 2010-08-18 16:02
    /user/hadoop/dataout
    -rw-r--r-- 1 hadoop supergroup 497 2010-08-24 11:14
    /user/hadoop/profile1
    -rw-r--r-- 1 hadoop supergroup 497 2010-08-24 11:14
    /user/hadoop/profile2
    -rw-r--r-- 1 hadoop supergroup 497 2010-08-24 11:14
    /user/hadoop/profile3
    -rw-r--r-- 1 hadoop supergroup 497 2010-08-24 11:14
    /user/hadoop/profile4


    On Tue, Aug 24, 2010 at 10:49 AM, Sudhir Vallamkondu wrote:
    Looking at the codebase it seems to suggest that it ignores a editlog
    storage directory if it encounters an error
    http://www.google.com/codesearch/p?hl=en#GLh8vwsjDqs/trunk/src/hdfs/org/apac
    >
    he/hadoop/hdfs/server/namenode/FSEditLog.java&q=namenode%20editlog&sa=N&cd=2
    0&ct=rc

    Check lines:
    Code in line 334
    comment: 387 - 390
    comment: 411 - 414
    Comment: 433 - 436

    The processIOError method is called throughout the code if it encounters an
    IOException.

    A fatal error is only thrown if none of the storage directories is
    accessible. Lines 394, 420

    - Sudhir



    On Aug/23/ 2:21 PM, "common-user-digest-help@hadoop.apache.org"
    wrote:
    From: Michael Segel <michael_segel@hotmail.com>
    Date: Mon, 23 Aug 2010 14:05:05 -0500
    To: <common-user@hadoop.apache.org>
    Subject: RE: what will happen if a backup name node folder becomes
    unaccessible?


    Ok...

    Now you have me confused.
    Everything we've seen says that writing to both a local disk and to an
    NFS
    mounted disk would be the best way to prevent a problem.

    Now you and Harsh J say that this could actually be problematic.

    Which is it?
    Is this now a defect that should be addressed, or should we just not use
    an
    NFS mounted drive?

    Thx

    -Mike

    Date: Mon, 23 Aug 2010 11:42:59 -0700
    From: licht_jiang@yahoo.com
    Subject: Re: what will happen if a backup name node folder becomes
    unaccessible?
    To: common-user@hadoop.apache.org

    This makes a good argument. Actually, after seeing the previous reply, I
    kindof convinced that I should go back to "sync" the meta data to a
    backup
    location instead of using this feature, which as David mentioned,
    introduced
    a 2nd single point of failure to hadoop, which degrades the availability
    of
    hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can
    someone
    confirm whether a name node will shut down given that a backup folder
    listed
    in "dfs.name.dir" becomes unavailable in this version?

    Thanks,

    Michael

    --- On Sun, 8/22/10, David B. Ritch wrote:

    From: David B. Ritch <david.ritch@gmail.com>
    Subject: Re: what will happen if a backup name node folder becomes
    unaccessible?
    To: common-user@hadoop.apache.org
    Date: Sunday, August 22, 2010, 11:34 PM

    Which version of Hadoop was this? The folks at Cloudera have assured
    me that the namenode in CDH2 will continue as long as one of the
    directories is still writable.

    It *does* seem a bit of a waste if an availability feature - the ability
    to write to multiple directories - actually reduces availability by
    providing an additional single point of failure.

    Thanks!

    dbr
    On 8/20/2010 5:27 PM, Harsh J wrote:
    Whee, lets try it out:

    Start with both paths available. ... Starts fine.
    Store some files. ... Works.
    rm -r the second path. ... Ouch.
    Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
    stuff back yet]
    Wait for checkpoint to hit.
    And ...
    Boom!

    2010-08-21 02:42:00,385 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
    from 127.0.0.1
    2010-08-21 02:42:00,385 INFO
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
    transactions: 37 Total time for transactions(ms): 6Number of
    transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
    307 277
    2010-08-21 02:42:00,439 FATAL
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
    storage directories are inaccessible.
    2010-08-21 02:42:00,440 INFO
    org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
    ************************************************************/

    So yes, as Edward says - never let this happen!
    On Sat, Aug 21, 2010 at 2:26 AM, jiang licht wrote:
    Using nfs folder to back up dfs meta information as follows,

    <property>
    <name>dfs.name.dir</name>
    <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
    </property>

    where /hadoop-backup is on a backup machine and mounted on the master
    node.
    I have a question: if somehow, the backup folder becomes unavailable,
    will
    it freeze master node? That is, will write operation simply hang up on
    this
    condition on the master node? Or will master node log the problem and
    continues to work?

    Thanks,

    Michael




    iCrossing Privileged and Confidential Information
    This email message is for the sole use of the intended recipient(s) and
    may contain confidential and privileged information of iCrossing. Any
    unauthorized review, use, disclosure or distribution is prohibited. If you
    are not the intended recipient, please contact the sender by reply email and
    destroy all copies of the original message.
    Above steps were done performed using Apache Hadoop 0.20.2. Not cloudera's
    version of it, if that helps.

    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 24, '10 at 5:21a
activeAug 24, '10 at 6:12a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Sudhir Vallamkondu: 1 post Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase