On 12/4/13 2:34 PM, Mark Schnegelberger wrote:Hi Rex,
Clicking on that health check will give you additional detail around
what this health check does, and why we check it: "This is a DataNode
health check that checks for whether the DataNode has too many blocks.
Having too many blocks on a DataNode may affect the DataNode's
performance, and an increasing block count may require additional heap
space to prevent long garbage collection pauses. This test can be
configured using the *DataNode Block Count Thresholds* DataNode
monitoring setting."
In your case, you have at least one Datanode with 1.6M (!) blocks.
Cloudera Manager notifies you of this so you can take action. As to
why this node has such a high block count, perhaps you're writing a
lot of very tiny files or could aggregate them in some other fashion.
While you *could* just increase the threshold to stop notifying you of
this health alert, you may wish to delve deeper into the rationale for
so many blocks.
--
Mark S.
On Wed, Dec 4, 2013 at 5:07 AM, Rex Zhen wrote:
it is getting better after i increased the hip size in namenode
from 1g(default) to 4G.
still get warning "The DataNode has 1,678,630 blocks. Warning
threshold: 200,000 block(s)
<
http://nn-01-sc.nim.com:7180/cmf/services/31/instances/126/advicePopup?timestamp=1386151485571¤tMode=true&healthTestName=DATA_NODE_BLOCK_COUNT>"
and i can see the block map is keeping update.
Is that normal? or i can increase the warning threshold in the
config?
On Wednesday, December 4, 2013 1:13:41 AM UTC-8, Rex Zhen wrote:
Hi,
we are running CDH4.1 for a couple of month, with HA enabled
for quorum-based storage. Suddenly the cluster is in bad health,
here is the log initially
"The reported blocks 3132015 has reached the threshold 0.9990
of total blocks 3135151. Safe mode will be turned off
automatically in 29 seconds."
after a couple mins, both namenode shutting down.
here is part of the log,
FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error:
flush failed for required journal (JournalAndStream(mgr=QJM to
[192.168.x.x:8485, 192.168.x.x:8485, 192.168.x.x:8485],
stream=QuorumOutputStream starting at txid 22877548))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got
too many exceptions to achieve quorum size 2/3. 3 exceptions
thrown:
192.168.x.x:8485: IPC's epoch 19 is less than the last
promised epoch 20
SHUTDOWN_MSG: Shutting down NameNode at x.x.x.x
any one can help?
To unsubscribe from this group and stop receiving emails from it,
send an email to scm-users+
unsubscribe@cloudera.orgTo unsubscribe from this group and stop receiving emails from it, send
an email to scm-users+unsubscribe@cloudera.org.
To unsubscribe from this group and stop receiving emails from it, send an email to scm-users+unsubscribe@cloudera.org.