FAQ
Reposting on cdh-user.


---------- Forwarded message ----------
From: Barry Becker <barrybecker4@gmail.com>
Date: Wed, Apr 24, 2013 at 9:47 AM
Subject: What is the right java heap size for HDFS and Name nodes?
To: impala-user@cloudera.org


The default HDFS setting for the DataNode and NameNode(base) jvm heap
size is 1G, but if we run on nodes with 64G of memory, should we make
it bigger? If so, how much bigger?
The NameNode(Base) recommendation shown in ClouderaManager is
The name node recommends 1G for every million HDFS blocks.

I'm not really sure how to determine the number of HDFS blocks, but
when I use fsck I see

[pros_user @ 54.215.104.218]hadoop fsck /user/hive/warehouse
FSCK started by pros_user (auth:SIMPLE) from /10.174.89.207 for path
/user/hive/warehouse at Wed Apr 24 10:59:30 CDT 2013
..........................Status: HEALTHY
Total size: 276063767817 B
Total dirs: 76
Total files: 1526
Total blocks (validated): 2881 (avg. block size 95822203 B)
Minimally replicated blocks: 2881 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 17
Number of racks: 1



There are 276,063,767,817 bytes (I think B means bytes and not
blocks). i.e 276G of disk.
One block is 95M.
There are 2881 blocks. So according to the recommendation we should
only need 1G (actually much less), but we did see a significant
speedup (24 mins to 19 mins for our performance run) when increasing
the jvm heap from 1G to 4G. Is that expected?

Search Discussions

  • Ram Krishnamurthy at Apr 24, 2013 at 5:19 pm
    How do you increase the jvm heap size using Cloudera Manager or other ways?

    Thanks, Ram

    On Wed, Apr 24, 2013 at 12:47 PM, Barry Becker wrote:

    The default HDFS setting for the DataNode and NameNode(base) jvm heap size
    is 1G, but if we run on nodes with 64G of memory, should we make it bigger?
    If so, how much bigger?
    The NameNode(Base) recommendation shown in ClouderaManager is
    *The name node recommends 1G for every million HDFS blocks.*

    I'm not really sure how to determine the number of HDFS blocks, but when I
    use fsck I see

    [pros_user @ 54.215.104.218]hadoop fsck /user/hive/warehouse
    FSCK started by pros_user (auth:SIMPLE) from /10.174.89.207 for path
    /user/hive/warehouse at Wed Apr 24 10:59:30 CDT 2013
    ..........................Status: HEALTHY
    Total size: 276063767817 B
    Total dirs: 76
    Total files: 1526
    Total blocks (validated): 2881 (avg. block size 95822203 B)
    Minimally replicated blocks: 2881 (100.0 %)
    Over-replicated blocks: 0 (0.0 %)
    Under-replicated blocks: 0 (0.0 %)
    Mis-replicated blocks: 0 (0.0 %)
    Default replication factor: 3
    Average block replication: 3.0
    Corrupt blocks: 0
    Missing replicas: 0 (0.0 %)
    Number of data-nodes: 17
    Number of racks: 1



    There are 276,063,767,817 bytes (I think B means bytes and not blocks).
    i.e 276G of disk.
    One block is 95M.
    There are 2881 blocks. So according to the recommendation we should only
    need 1G (actually much less), but we did see a significant speedup (24 mins
    to 19 mins for our performance run) when increasing the jvm heap from 1G to
    4G. Is that expected?

    --
    Thanks,
    *Ram Krishnamurthy*
    rkrishnamurthy@greenway-solutions.com
    *Cell: 704-953-8125*

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedApr 24, '13 at 5:17p
activeApr 24, '13 at 5:19p
posts2
users2
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase