Reposting on cdh-user.
---------- Forwarded message ----------
From: Barry Becker <barrybecker4@gmail.com>
Date: Wed, Apr 24, 2013 at 9:47 AM
Subject: What is the right java heap size for HDFS and Name nodes?
To: impala-user@cloudera.org
The default HDFS setting for the DataNode and NameNode(base) jvm heap
size is 1G, but if we run on nodes with 64G of memory, should we make
it bigger? If so, how much bigger?
The NameNode(Base) recommendation shown in ClouderaManager is
The name node recommends 1G for every million HDFS blocks.
I'm not really sure how to determine the number of HDFS blocks, but
when I use fsck I see
[pros_user @ 54.215.104.218]hadoop fsck /user/hive/warehouse
FSCK started by pros_user (auth:SIMPLE) from /10.174.89.207 for path
/user/hive/warehouse at Wed Apr 24 10:59:30 CDT 2013
..........................Status: HEALTHY
Total size: 276063767817 B
Total dirs: 76
Total files: 1526
Total blocks (validated): 2881 (avg. block size 95822203 B)
Minimally replicated blocks: 2881 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 17
Number of racks: 1
There are 276,063,767,817 bytes (I think B means bytes and not
blocks). i.e 276G of disk.
One block is 95M.
There are 2881 blocks. So according to the recommendation we should
only need 1G (actually much less), but we did see a significant
speedup (24 mins to 19 mins for our performance run) when increasing
the jvm heap from 1G to 4G. Is that expected?