FAQ
We have an 11 node Hadoop cluster running 20.2 that has been in production for 15 months now. The system is used to process log files that are ingested daily, and the oldest files in the HDFS are deleted to free up space as needed, typically when the free space is less than 10% (the delete is done using 'hadoop fs -rmr' on the parent directory of the files to be deleted). When the HDFS was originally built it had 1TB of 'Non DFS' space out of the 20TB total. This 1TB stayed constant for at least the first year the system has been in use.

However over the last few weeks I have seen the 'Non DFS Used' as reported by the NameNode dfshealth.jsp page grow to 2G and rising. The total number of files/directories and blocks in use has remained fairly constant over this time. I am concerned that the Non DFS Used is going to consume more and more of the HDFS if left unchecked. Running fcsk gave "The filesystem under path '/' is HEALTHY".

Questions:

A) What exactly is hadoop reporting as 'Non DFS Used', and how is it calculated? Are these files on the same partition(s) as the HDFS files, but are not actually part of the HDFS?

2) Any ideas on what is driving the growth in Non DFS Used space? I looked for things like growing log files on the datanodes but didn't find anything.

Thanks,
Scott

Search Discussions

  • Todd Lipcon at May 13, 2011 at 5:48 pm

    On Fri, May 13, 2011 at 10:40 AM, Kester, Scott wrote:

    We have an 11 node Hadoop cluster running 20.2 that has been in
    production for 15 months now. The system is used to process log files that
    are ingested daily, and the oldest files in the HDFS are deleted to free up
    space as needed, typically when the free space is less than 10% (the delete
    is done using 'hadoop fs -rmr' on the parent directory of the files to be
    deleted). When the HDFS was originally built it had 1TB of 'Non DFS' space
    out of the 20TB total. This 1TB stayed constant for at least the first year
    the system has been in use.

    However over the last few weeks I have seen the 'Non DFS Used' as
    reported by the NameNode dfshealth.jsp page grow to 2G and rising. The
    total number of files/directories and blocks in use has remained fairly
    constant over this time. I am concerned that the Non DFS Used is going to
    consume more and more of the HDFS if left unchecked. Running fcsk gave "The
    filesystem under path '/' is HEALTHY".

    Questions:

    A) What exactly is hadoop reporting as 'Non DFS Used', and how is it
    calculated? Are these files on the same partition(s) as the HDFS files, but
    are not actually part of the HDFS?
    Yes - it's usage reported by "df" that isn't coming from HDFS blocks.

    2) Any ideas on what is driving the growth in Non DFS Used space? I
    looked for things like growing log files on the datanodes but didn't find
    anything.
    Logs are one possible culprit. Another is to look for old files that might
    be orphaned in your mapred.local.dir - there have been bugs in the past
    where we've leaked files. If you shut down the TaskTrackers, you can safely
    delete everything from within mapred.local.dirs.

    -Todd
    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Allen Wittenauer at May 13, 2011 at 7:12 pm

    On May 13, 2011, at 10:48 AM, Todd Lipcon wrote:

    2) Any ideas on what is driving the growth in Non DFS Used space? I
    looked for things like growing log files on the datanodes but didn't find
    anything.
    Logs are one possible culprit. Another is to look for old files that might
    be orphaned in your mapred.local.dir - there have been bugs in the past
    where we've leaked files. If you shut down the TaskTrackers, you can safely
    delete everything from within mapred.local.dirs.
    Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out. The TT doesn't properly clean up after itself.
  • Kester, Scott at May 13, 2011 at 8:41 pm
    We have a job that cleans up the mapred.local directory, so that¹s not it.
    I have done some further looking at data usage on the datanodes and 99%
    of the space used is under the dfs.data.dir/current directory. What would
    be under 'current' that wasn't part of HDFS?
    On 5/13/11 3:12 PM, "Allen Wittenauer" wrote:

    On May 13, 2011, at 10:48 AM, Todd Lipcon wrote:

    2) Any ideas on what is driving the growth in Non DFS Used space? I
    looked for things like growing log files on the datanodes but didn't
    find
    anything.
    Logs are one possible culprit. Another is to look for old files that
    might
    be orphaned in your mapred.local.dir - there have been bugs in the past
    where we've leaked files. If you shut down the TaskTrackers, you can
    safely
    delete everything from within mapred.local.dirs.
    Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out.
    The TT doesn't properly clean up after itself.
  • Suresh srinivas at May 15, 2011 at 4:21 am
    dfs.data.dir/current is used by datanodes to store blocks. This directory
    should only have files starting with blk-*

    Things to check:
    - Are there other files that are not blk related?
    - Did you manually copy the content of one storage dir to another? (some
    folks did this when they added new disks)

    On Fri, May 13, 2011 at 1:41 PM, Kester, Scott wrote:

    We have a job that cleans up the mapred.local directory, so that¹s not it.
    I have done some further looking at data usage on the datanodes and 99%
    of the space used is under the dfs.data.dir/current directory. What would
    be under 'current' that wasn't part of HDFS?
    On 5/13/11 3:12 PM, "Allen Wittenauer" wrote:

    On May 13, 2011, at 10:48 AM, Todd Lipcon wrote:

    2) Any ideas on what is driving the growth in Non DFS Used space? I
    looked for things like growing log files on the datanodes but didn't
    find
    anything.
    Logs are one possible culprit. Another is to look for old files that
    might
    be orphaned in your mapred.local.dir - there have been bugs in the past
    where we've leaked files. If you shut down the TaskTrackers, you can
    safely
    delete everything from within mapred.local.dirs.
    Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out.
    The TT doesn't properly clean up after itself.

    --
    Regards,
    Suresh
  • Kester, Scott at May 16, 2011 at 3:51 pm
    I was able to track this down this morning. The process that ingests the log files into the HDFS cluster is not closing file handles after it deletes temp files created during ingest. That causes df and du to report different values of usage. Re-starting the ingest process cleared the filehandles and the Non DFS space is now back to normal. Thanks for the help guys.

    From: suresh srinivas <srini30005@gmail.com
    Reply-To: "hdfs-user@hadoop.apache.org " <hdfs-user@hadoop.apache.org
    Date: Sat, 14 May 2011 21:20:44 -0700
    To: "hdfs-user@hadoop.apache.org " <hdfs-user@hadoop.apache.org
    Subject: Re: Rapid growth in Non DFS Used disk space

    dfs.data.dir/current is used by datanodes to store blocks. This directory should only have files starting with blk-*

    Things to check:
    - Are there other files that are not blk related?
    - Did you manually copy the content of one storage dir to another? (some folks did this when they added new disks)


    On Fri, May 13, 2011 at 1:41 PM, Kester, Scott wrote:
    We have a job that cleans up the mapred.local directory, so that¹s not it.
    I have done some further looking at data usage on the datanodes and 99%
    of the space used is under the dfs.data.dir/current directory. What would
    be under 'current' that wasn't part of HDFS?
    On 5/13/11 3:12 PM, "Allen Wittenauer" wrote:

    On May 13, 2011, at 10:48 AM, Todd Lipcon wrote:

    2) Any ideas on what is driving the growth in Non DFS Used space? I
    looked for things like growing log files on the datanodes but didn't
    find
    anything.
    Logs are one possible culprit. Another is to look for old files that
    might
    be orphaned in your mapred.local.dir - there have been bugs in the past
    where we've leaked files. If you shut down the TaskTrackers, you can
    safely
    delete everything from within mapred.local.dirs.
    Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out.
    The TT doesn't properly clean up after itself.



    --
    Regards,
    Suresh

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedMay 13, '11 at 5:41p
activeMay 16, '11 at 3:51p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase