FAQ
We have a bad disk on one of our datanode machines, and while we have
dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
problem while the DataNode process was running we are seeing a problem
when we needed to restart the DataNode process:

2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
rwxr-xr-x, while actual: ---------. Fixing...
2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2011-03-24 16:50:20,091 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
permitted

In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
It gets that permission error because we have the mount directory set
to be immutable:

root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
------------------- /var/lib/stats/hdfs/2
----i------------e- /var/lib/stats/hdfs/4
------------------- /var/lib/stats/hdfs/3
------------------- /var/lib/stats/hdfs/1

As we'd previously seen HDFS just write to the local disk when a disk
couldn't be mounted.

HDFS is supposed to be able to handle failed disk, but it doesn't seem
to be doing the right thing in this case. Is this a known problem, or
is there some other way we should be configuring things to allow the
DataNode to come up in this situation?

(clearly we can remove the mount point from hdfs-site.xml, but that
doesn't feel like the correct solution)

Thanks
- Adam

Search Discussions

  • Adam Phelps at Mar 24, 2011 at 5:48 pm
    For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.

    - Adam
    On 3/24/11 10:30 AM, Adam Phelps wrote:
    We have a bad disk on one of our datanode machines, and while we have
    dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
    problem while the DataNode process was running we are seeing a problem
    when we needed to restart the DataNode process:

    2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
    Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
    rwxr-xr-x, while actual: ---------. Fixing...
    2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
    Loaded the native-hadoop library
    2011-03-24 16:50:20,091 ERROR
    org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
    permitted

    In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
    It gets that permission error because we have the mount directory set to
    be immutable:

    root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
    ------------------- /var/lib/stats/hdfs/2
    ----i------------e- /var/lib/stats/hdfs/4
    ------------------- /var/lib/stats/hdfs/3
    ------------------- /var/lib/stats/hdfs/1

    As we'd previously seen HDFS just write to the local disk when a disk
    couldn't be mounted.

    HDFS is supposed to be able to handle failed disk, but it doesn't seem
    to be doing the right thing in this case. Is this a known problem, or is
    there some other way we should be configuring things to allow the
    DataNode to come up in this situation?

    (clearly we can remove the mount point from hdfs-site.xml, but that
    doesn't feel like the correct solution)

    Thanks
    - Adam
  • Allen Wittenauer at Mar 25, 2011 at 4:44 pm

    On Mar 24, 2011, at 10:47 AM, Adam Phelps wrote:

    For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.
    Given that this isn't a standard Apache release, you'll likely be better served by asking Cloudera.
  • Aaron T. Myers at Mar 25, 2011 at 4:49 pm
    bcc: hdfs-user@hadoop.apache.org
    + cdh-user@cloudera.org

    Hey Adam,

    Thanks a lot for the bug report. I've added cdh-user@ to this email, which
    may be a more appropriate list for this question.

    Best,
    Aaron

    --
    Aaron T. Myers
    Software Engineer, Cloudera


    On Thu, Mar 24, 2011 at 10:47 AM, Adam Phelps wrote:

    For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.

    - Adam

    On 3/24/11 10:30 AM, Adam Phelps wrote:

    We have a bad disk on one of our datanode machines, and while we have
    dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
    problem while the DataNode process was running we are seeing a problem
    when we needed to restart the DataNode process:

    2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
    Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
    rwxr-xr-x, while actual: ---------. Fixing...
    2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
    Loaded the native-hadoop library
    2011-03-24 16:50:20,091 ERROR
    org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
    permitted

    In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
    It gets that permission error because we have the mount directory set to
    be immutable:

    root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
    ------------------- /var/lib/stats/hdfs/2
    ----i------------e- /var/lib/stats/hdfs/4
    ------------------- /var/lib/stats/hdfs/3
    ------------------- /var/lib/stats/hdfs/1

    As we'd previously seen HDFS just write to the local disk when a disk
    couldn't be mounted.

    HDFS is supposed to be able to handle failed disk, but it doesn't seem
    to be doing the right thing in this case. Is this a known problem, or is
    there some other way we should be configuring things to allow the
    DataNode to come up in this situation?

    (clearly we can remove the mount point from hdfs-site.xml, but that
    doesn't feel like the correct solution)

    Thanks
    - Adam
  • Bharath Mundlapudi at Mar 24, 2011 at 11:00 pm
    Hi Adam,

    I have posted a patch for this problem for Hadoop version 20. Please refer the following Jira.

    https://issues.apache.org/jira/browse/HDFS-1592

    -Bharath



    ________________________________
    From: Adam Phelps <amp@opendns.com>
    To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    Sent: Thursday, March 24, 2011 10:30 AM
    Subject: Re: Datanode won't start with bad disk

    We have a bad disk on one of our datanode machines, and while we have dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any problem while the DataNode process was running we are seeing a problem when we needed to restart the DataNode process:

    2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: rwxr-xr-x, while actual: ---------. Fixing...
    2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
    2011-03-24 16:50:20,091 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not permitted

    In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.  It gets that permission error because we have the mount directory set to be immutable:

    root@s3:/var/log/hadoop# lsattr  /var/lib/stats/hdfs/
    ------------------- /var/lib/stats/hdfs/2
    ----i------------e- /var/lib/stats/hdfs/4
    ------------------- /var/lib/stats/hdfs/3
    ------------------- /var/lib/stats/hdfs/1

    As we'd previously seen HDFS just write to the local disk when a disk couldn't be mounted.

    HDFS is supposed to be able to handle failed disk, but it doesn't seem to be doing the right thing in this case.  Is this a known problem, or is there some other way we should be configuring things to allow the DataNode to come up in this situation?

    (clearly we can remove the mount point from hdfs-site.xml, but that doesn't feel like the correct solution)

    Thanks
    - Adam
  • Bharath Mundlapudi at Mar 24, 2011 at 11:08 pm
    Also, you will need this patch.
    https://issues.apache.org/jira/browse/HADOOP-7040




    ________________________________
    From: Bharath Mundlapudi <bharathwork@yahoo.com>
    To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    Sent: Thursday, March 24, 2011 4:00 PM
    Subject: Re: Datanode won't start with bad disk


    Hi Adam,

    I have posted a patch for this problem for Hadoop version 20. Please refer the following Jira.

    https://issues.apache.org/jira/browse/HDFS-1592

    -Bharath



    ________________________________
    From: Adam Phelps <amp@opendns.com>
    To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    Sent: Thursday, March 24, 2011 10:30 AM
    Subject: Re: Datanode won't start with bad disk

    We have a bad disk on one of our datanode machines, and while we have dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any problem while the DataNode process was running we are seeing a problem when we needed to restart the DataNode process:

    2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were set on /var/lib/stats/hdfs/4, expected: rwxr-xr-x, while actual: ---------. Fixing...
    2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
    2011-03-24 16:50:20,091 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not permitted

    In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.  It gets that permission error because we have the mount directory set to be immutable:

    root@s3:/var/log/hadoop# lsattr  /var/lib/stats/hdfs/
    ------------------- /var/lib/stats/hdfs/2
    ----i------------e-
    /var/lib/stats/hdfs/4
    ------------------- /var/lib/stats/hdfs/3
    ------------------- /var/lib/stats/hdfs/1

    As we'd previously seen HDFS just write to the local disk when a disk couldn't be mounted.

    HDFS is supposed to be able to handle failed disk, but it doesn't seem to be doing the right thing in this case.  Is this a known problem, or is there some other way we should be configuring things to allow the DataNode to come up in this situation?

    (clearly we can remove the mount point from hdfs-site.xml, but that doesn't feel like the correct solution)

    Thanks
    - Adam
  • Adam Phelps at Mar 25, 2011 at 12:22 am
    Thanks for the info. We may implement this patch if this continues to
    be a problem.

    - Adam
    On 3/24/11 4:08 PM, Bharath Mundlapudi wrote:
    Also, you will need this patch.
    https://issues.apache.org/jira/browse/HADOOP-7040


    ------------------------------------------------------------------------
    *From:* Bharath Mundlapudi <bharathwork@yahoo.com>
    *To:* "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    *Sent:* Thursday, March 24, 2011 4:00 PM
    *Subject:* Re: Datanode won't start with bad disk

    Hi Adam,

    I have posted a patch for this problem for Hadoop version 20. Please
    refer the following Jira.
    https://issues.apache.org/jira/browse/HDFS-1592

    -Bharath

    ------------------------------------------------------------------------
    *From:* Adam Phelps <amp@opendns.com>
    *To:* "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    *Sent:* Thursday, March 24, 2011 10:30 AM
    *Subject:* Re: Datanode won't start with bad disk

    We have a bad disk on one of our datanode machines, and while we have
    dfs.datanode.failed.volumes.tolerated set to 2 and didn't see any
    problem while the DataNode process was running we are seeing a problem
    when we needed to restart the DataNode process:

    2011-03-24 16:50:20,071 WARN org.apache.hadoop.util.DiskChecker:
    Incorrect permissions were set on /var/lib/stats/hdfs/4, expected:
    rwxr-xr-x, while actual: ---------. Fixing...
    2011-03-24 16:50:20,089 INFO org.apache.hadoop.util.NativeCodeLoader:
    Loaded the native-hadoop library
    2011-03-24 16:50:20,091 ERROR
    org.apache.hadoop.hdfs.server.datanode.DataNode: EPERM: Operation not
    permitted

    In this case /var/lib/stats/hdfs/4 is the mount point for the bad disk.
    It gets that permission error because we have the mount directory set to
    be immutable:

    root@s3:/var/log/hadoop# lsattr /var/lib/stats/hdfs/
    ------------------- /var/lib/stats/hdfs/2
    ----i------------e- /var/lib/stats/hdfs/4
    ------------------- /var/lib/stats/hdfs/3
    ------------------- /var/lib/stats/hdfs/1

    As we'd previously seen HDFS just write to the local disk when a disk
    couldn't be mounted.

    HDFS is supposed to be able to handle failed disk, but it doesn't seem
    to be doing the right thing in this case. Is this a known problem, or is
    there some other way we should be configuring things to allow the
    DataNode to come up in this situation?

    (clearly we can remove the mount point from hdfs-site.xml, but that
    doesn't feel like the correct solution)

    Thanks
    - Adam


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedMar 24, '11 at 5:31p
activeMar 25, '11 at 4:49p
posts7
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase