I'm having trouble figuring out the numbers reported by 'hadoop dfs
-dus' versus the numbers reported by the namenode web interface.

I have a 4 node clusters, 4TB of disk on each node.

hadoop dfs -dus /
hdfs://hdp01-01:9000/ 1691626356288

Numbers on datanode web interface:

Capacity : 14.13 TB
DFS Remaining : 1.41 TB
DFS Used : 11.88 TB

My default replication level is 3, but the bulk of my files have their
replication level set to two. So looking at the 'dfs -dus' number, in
the worst case, I think I should be using 1691626356288*3=5074879068864
bytes, i.e. approx 5TB, not 11.88 as the web interface reports.

fsck seems happy:

Total size: 1691626405661 B
Total dirs: 11780
Total files: 82137 (Files currently being written: 1)
Total blocks (validated): 84054 (avg. block size 20125471 B)
Minimally replicated blocks: 84054 (100.0 %)
Over-replicated blocks: 6 (0.007138268 %)
Under-replicated blocks: 1 (0.0011897114 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.731268
Corrupt blocks: 0
Missing replicas: 6 (0.0026135363 %)
Number of data-nodes: 4
Number of racks: 1

This is on 0.18.3/Cloudera.

I've also verified that the bulk on the data on my disks are under the
hadoop/dfs/data/current directory on each disk.

Clearly I'm misunderstanding something, or there's something weird
going on. Hints?

Erik Forsberg <forsberg@opera.com>
Developer, Opera Software - http://www.opera.com/

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 3 | next ›
Discussion Overview
groupcommon-user @
postedJan 14, '10 at 8:43a
activeJan 14, '10 at 9:22a

2 users in discussion

Erik Forsberg: 2 posts Eli Collins: 1 post



site design / logo © 2022 Grokbase