FAQ
Hey HDFS gurus -

I searched around the list archives and jira, but didn't see an existing
discussion about this.

I'm having issues where HDFS in general has free space, however, certain
machines -- and certain disks -- become full. For example, below is disk
usage for an average looking node for this cluster, meaning the balancer
won't want to move data off this machine.

Originally, I wanted to alert when HDFS in general was getting full, but
that doesn't work in practice because certain machines fill up. And I can't
look at the per-machine stats, because individual disks fill up. I really
don't want to care about individual disks in HDFS but it seems they can
cause actual problems.

Does anyone else run into machines with overfull disks? Any tips on how to
avoid getting into this situation?


Configured capacity: 7.72 TB
Used: 6.43 TB

Filesystem Size Used Avail Use% Mounted on
/dev/cciss/c1d0p1 65G 15G 46G 25% /
tmpfs 31G 0 31G 0% /dev/shm
/dev/cciss/c0d0 275G 217G 45G 83% /data/disk000
/dev/cciss/c0d1 275G 219G 43G 84% /data/disk001
/dev/cciss/c0d2 275G 216G 46G 83% /data/disk002
/dev/cciss/c0d3 275G 220G 42G 85% /data/disk003
/dev/cciss/c0d4 275G 248G 14G 95% /data/disk004
/dev/cciss/c0d5 275G 219G 43G 84% /data/disk005
/dev/cciss/c0d6 275G 219G 43G 84% /data/disk006
/dev/cciss/c0d7 275G 213G 49G 82% /data/disk007
/dev/cciss/c0d8 275G 220G 42G 85% /data/disk008
/dev/cciss/c0d9 275G 208G 54G 80% /data/disk009
/dev/cciss/c0d10 275G 216G 46G 83% /data/disk010
/dev/cciss/c0d11 275G 218G 44G 84% /data/disk011
/dev/cciss/c0d12 275G 223G 39G 86% /data/disk012
/dev/cciss/c0d13 275G 221G 41G 85% /data/disk013
/dev/cciss/c0d14 275G 248G 14G 95% /data/disk014
/dev/cciss/c0d15 275G 219G 43G 84% /data/disk015
/dev/cciss/c0d16 275G 216G 46G 83% /data/disk016
/dev/cciss/c0d17 275G 216G 46G 83% /data/disk017
/dev/cciss/c0d18 275G 219G 43G 84% /data/disk018
/dev/cciss/c0d19 275G 220G 42G 84% /data/disk019
/dev/cciss/c0d20 275G 213G 49G 82% /data/disk020
/dev/cciss/c0d21 275G 215G 47G 83% /data/disk021
/dev/cciss/c0d22 275G 247G 15G 95% /data/disk022
/dev/cciss/c0d23 275G 218G 44G 84% /data/disk023
/dev/cciss/c0d24 275G 222G 40G 86% /data/disk024
/dev/cciss/c1d1p1 275G 184G 78G 71% /data/disk025
/dev/cciss/c1d2p1 275G 176G 86G 68% /data/disk026
/dev/cciss/c1d3p1 275G 178G 84G 68% /data/disk027
/dev/cciss/c1d4p1 275G 177G 85G 68% /data/disk028
/dev/cciss/c1d5p1 275G 179G 83G 69% /data/disk029
/dev/cciss/c1d6p1 275G 181G 81G 70% /data/disk030

--travis

Search Discussions

  • Allen Wittenauer at Jul 21, 2010 at 9:03 pm

    On Jul 21, 2010, at 12:45 PM, Travis Crawford wrote:
    Does anyone else run into machines with overfull disks?
    It was a common problem when I was at Yahoo!. As the drives get more full, the NN starts getting slower and slower, since it is going to have problems with block placement.
    Any tips on how to avoid getting into this situation?
    What we started to do was two-fold:

    a) During every maintenance, we'd blow away the mapred temp dirs. The TaskTracker does a very bad job of cleaning up after jobs and there is usually a lot of cruft. If you have a 'flat' disk/fs structure such that MR temp and HDFS is shared, this is a huge problem.

    b) Blowing away /tmp on a regular basis. Here at LI, I've got a perl script that I wrote that reads the output of ls /tmp, finds files/dirs older than 3 days, and removes them. Since pig is a little piggy and leaves a ton of useless data in /tmp, I often see 15TB or more disappear just by doing this.
    /dev/cciss/c0d0 275G 217G 45G 83% /data/disk000
    The bigger problem is that Hadoop just really doesn't work well with such small filesystems. You might want to check your fs reserved size. You might be able to squeak out a bit more space that way too.
    /dev/cciss/c0d14 275G 248G 14G 95% /data/disk014
    I'd probably shutdown this data node and manually move blocks off of this drive onto ...
    /dev/cciss/c1d1p1 275G 184G 78G 71% /data/disk025
    /dev/cciss/c1d2p1 275G 176G 86G 68% /data/disk026
    /dev/cciss/c1d3p1 275G 178G 84G 68% /data/disk027
    /dev/cciss/c1d4p1 275G 177G 85G 68% /data/disk028
    /dev/cciss/c1d5p1 275G 179G 83G 69% /data/disk029
    /dev/cciss/c1d6p1 275G 181G 81G 70% /data/disk030
    ... one of these.
  • Alex Loddengaard at Jul 21, 2010 at 9:10 pm

    On Wed, Jul 21, 2010 at 3:01 PM, Allen Wittenauer wrote:

    On Jul 21, 2010, at 12:45 PM, Travis Crawford wrote:
    Any tips on how to avoid getting into this situation?
    What we started to do was two-fold:

    a) During every maintenance, we'd blow away the mapred temp dirs. The
    TaskTracker does a very bad job of cleaning up after jobs and there is
    usually a lot of cruft. If you have a 'flat' disk/fs structure such that MR
    temp and HDFS is shared, this is a huge problem.
    I setup a cron job to delete files older than 5 days in mapred.local.dir.
    I've also found that sometimes userlogs aren't cleaned up correctly, so
    setting up a cron job to delete old files in userlogs is also a good idea.

    Good luck, Travis!

    Alex
  • Allen Wittenauer at Jul 21, 2010 at 9:16 pm

    On Jul 21, 2010, at 2:09 PM, Alex Loddengaard wrote:
    I setup a cron job to delete files older than 5 days in mapred.local.dir. I've also found that sometimes userlogs aren't cleaned up correctly, so setting up a cron job to delete old files in userlogs is also a good idea.
    I wish the tasktracker was smarter about its cache; i.e., actually treating it like a cache. When I've removed stuff from it, the tasks got very very cranky. I should file a jira. But I think I'm over my quota.
  • Travis Crawford at Jul 21, 2010 at 9:48 pm

    On Wed, Jul 21, 2010 at 2:01 PM, Allen Wittenauer wrote:
    On Jul 21, 2010, at 12:45 PM, Travis Crawford wrote:
    Does anyone else run into machines with overfull disks?
    It was a common problem when I was at Yahoo!.  As the drives get more full, the NN starts getting slower and slower, since it is going to have problems with block placement.
    Any tips on how to avoid getting into this situation?
    What we started to do was two-fold:

    a) During every maintenance, we'd blow away the mapred temp dirs.  The TaskTracker does a very bad job of cleaning up after jobs and there is usually a lot of cruft.  If you have a 'flat' disk/fs structure such that MR temp and HDFS is shared, this is a huge problem.

    b) Blowing away /tmp on a regular basis.  Here at LI, I've got a perl script that I wrote that reads the output of ls /tmp, finds files/dirs older than 3 days, and removes them.  Since pig is a little piggy and leaves a ton of useless data in /tmp, I often see 15TB or more disappear just by doing this.
    /dev/cciss/c0d0       275G  217G   45G  83% /data/disk000
    The bigger problem is that Hadoop just really doesn't work well with such small filesystems.  You might want to check your fs reserved size.  You might be able to squeak out a bit more space that way too.
    /dev/cciss/c0d14      275G  248G   14G  95% /data/disk014
    I'd probably shutdown this data node and manually move blocks off of this drive onto ...
    /dev/cciss/c1d1p1     275G  184G   78G  71% /data/disk025
    /dev/cciss/c1d2p1     275G  176G   86G  68% /data/disk026
    /dev/cciss/c1d3p1     275G  178G   84G  68% /data/disk027
    /dev/cciss/c1d4p1     275G  177G   85G  68% /data/disk028
    /dev/cciss/c1d5p1     275G  179G   83G  69% /data/disk029
    /dev/cciss/c1d6p1     275G  181G   81G  70% /data/disk030
    ... one of these.

    Thanks for the tips! Interestingly, there is some cruft that's built
    up but its actually not that much total space.

    We've had to shut datanodes down once before to manually move blocks
    around, sounds like that's going to happen again this time too.

    I'll file a jira about this, since its come up twice now. Last time,
    disks added to existing cluster nodes. The balancer did not move data
    around as they were all "balanced" -- even though 25 disks eventually
    reached 100% usage, and 5 disks were pretty much empty.

    What would this feature look like? Datanodes already have some idea of
    how much space is available per-disk -- would it be appropriate to
    weight less-full disks more heavily for writes? Of course, an empty
    disk shouldn't get hammered with writes so this would need to be a
    preference for less used disks.

    Thoughts?

    --travis
  • Allen Wittenauer at Jul 21, 2010 at 10:07 pm

    On Jul 21, 2010, at 2:47 PM, Travis Crawford wrote:

    I'll file a jira about this, since its come up twice now. Last time,
    disks added to existing cluster nodes. The balancer did not move data
    around as they were all "balanced" -- even though 25 disks eventually
    reached 100% usage, and 5 disks were pretty much empty.
    I'm 99% certain this JIRA already exists, but I can't seem to find it.
  • Travis Crawford at Jul 21, 2010 at 11:49 pm
    If you've ever manually rebalanced blocks on your datanodes, or never
    want to, please vote for:

    https://issues.apache.org/jira/browse/HDFS-1312

    --travis


    On Wed, Jul 21, 2010 at 3:06 PM, Allen Wittenauer
    wrote:
    On Jul 21, 2010, at 2:47 PM, Travis Crawford wrote:

    I'll file a jira about this, since its come up twice now. Last time,
    disks added to existing cluster nodes. The balancer did not move data
    around as they were all "balanced" -- even though 25 disks eventually
    reached 100% usage, and 5 disks were pretty much empty.
    I'm 99% certain this JIRA already exists, but I can't seem to find it.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedJul 21, '10 at 7:47p
activeJul 21, '10 at 11:49p
posts7
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase