FAQ
Hi!

I'm having trouble figuring out the numbers reported by 'hadoop dfs
-dus' versus the numbers reported by the namenode web interface.

I have a 4 node clusters, 4TB of disk on each node.

hadoop dfs -dus /
hdfs://hdp01-01:9000/ 1691626356288

Numbers on datanode web interface:

Capacity : 14.13 TB
DFS Remaining : 1.41 TB
DFS Used : 11.88 TB

My default replication level is 3, but the bulk of my files have their
replication level set to two. So looking at the 'dfs -dus' number, in
the worst case, I think I should be using 1691626356288*3=5074879068864
bytes, i.e. approx 5TB, not 11.88 as the web interface reports.

fsck seems happy:

Status: HEALTHY
Total size: 1691626405661 B
Total dirs: 11780
Total files: 82137 (Files currently being written: 1)
Total blocks (validated): 84054 (avg. block size 20125471 B)
Minimally replicated blocks: 84054 (100.0 %)
Over-replicated blocks: 6 (0.007138268 %)
Under-replicated blocks: 1 (0.0011897114 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.731268
Corrupt blocks: 0
Missing replicas: 6 (0.0026135363 %)
Number of data-nodes: 4
Number of racks: 1

This is on 0.18.3/Cloudera.

I've also verified that the bulk on the data on my disks are under the
hadoop/dfs/data/current directory on each disk.

Clearly I'm misunderstanding something, or there's something weird
going on. Hints?

Thanks,
\EF
--
Erik Forsberg <forsberg@opera.com>
Developer, Opera Software - http://www.opera.com/

Search Discussions

  • Eli Collins at Jan 14, 2010 at 9:11 am

    On Thu, Jan 14, 2010 at 12:43 AM, Erik Forsberg wrote:
    Hi!

    I'm having trouble figuring out the numbers reported by 'hadoop dfs
    -dus' versus the numbers reported by the namenode web interface.

    I have a 4 node clusters, 4TB of disk on each node.

    hadoop dfs -dus /
    hdfs://hdp01-01:9000/   1691626356288

    Numbers on datanode web interface:

    Capacity        :       14.13 TB
    DFS Remaining   :       1.41 TB
    DFS Used        :       11.88 TB

    My default replication level is 3, but the bulk of my files have their
    replication level set to two. So looking at the 'dfs -dus' number, in
    the worst case, I think I should be using 1691626356288*3=5074879068864
    bytes, i.e. approx 5TB, not 11.88 as the web interface reports.

    fsck seems happy:

    Status: HEALTHY
    Total size:    1691626405661 B
    Total dirs:    11780
    Total files:   82137 (Files currently being written: 1)
    Total blocks (validated):      84054 (avg. block size 20125471 B)
    Minimally replicated blocks:   84054 (100.0 %)
    Over-replicated blocks:        6 (0.007138268 %)
    Under-replicated blocks:       1 (0.0011897114 %)
    Mis-replicated blocks:         0 (0.0 %)
    Default replication factor:    3
    Average block replication:     2.731268
    Corrupt blocks:                0
    Missing replicas:              6 (0.0026135363 %)
    Number of data-nodes:          4
    Number of racks:               1

    This is on 0.18.3/Cloudera.

    I've also verified that the bulk on the data on my disks are under the
    hadoop/dfs/data/current directory on each disk.

    Clearly I'm misunderstanding something, or there's something weird
    going on. Hints?
    Hey Erik,

    Are there a lot of files in the tmp directories in dfs.data.dir on
    each data node? What does du (on the host) for these directories
    report? This might be HDFS-821. dfsadmin -report output would be
    useful as well.

    Thanks,
    Eli
  • Erik Forsberg at Jan 14, 2010 at 9:22 am

    On Thu, 14 Jan 2010 01:11:29 -0800 Eli Collins wrote:

    Are there a lot of files in the tmp directories in dfs.data.dir on
    each data node? What does du (on the host) for these directories
    report? This might be HDFS-821. dfsadmin -report output would be
    useful as well.
    Hmm.. seems I was a bit quick in drawing conclusions - I deleted a lot
    of files before running the 'dfs -dus' commands, and it seems it takes
    a while for the raw disk space to be reclaimed. My cluster is now at
    6.72TB remaining (compared to 1.41 TB 10 minutes ago), and that figure
    seems to be increasing as I write my mail.

    No, there are not much in the tmp directories.

    Thanks for replying anyway, I did learn something from the
    experience :-).

    Thanks,
    \EF
    --
    Erik Forsberg <forsberg@opera.com>
    Developer, Opera Software - http://www.opera.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 14, '10 at 8:43a
activeJan 14, '10 at 9:22a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Erik Forsberg: 2 posts Eli Collins: 1 post

People

Translate

site design / logo © 2021 Grokbase