FAQ
high cpu usage in ReplicationMonitor thread
--------------------------------------------

Key: HADOOP-1221
URL: https://issues.apache.org/jira/browse/HADOOP-1221
Project: Hadoop
Issue Type: Bug
Components: dfs
Reporter: Koji Noguchi


We had a namenode stuck in CPU 99% and it was showing a slow response time.
(dfs.namenode.handler.count was still set to 10.)

ReplicationMonitor thread was using the most CPU time.
Jstack showed,

"org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@1c7b0f4d" daemon prio=10 tid=0x0000002d90690800 nid=0x4855 runnable [0x0000000041941000..0x0000000041941b30]
java.lang.Thread.State: RUNNABLE
at java.util.AbstractList$Itr.remove(AbstractList.java:360)
at org.apache.hadoop.dfs.FSNamesystem.blocksToInvalidate(FSNamesystem.java:2475)
- locked <0x0000002a9f522038> (a org.apache.hadoop.dfs.FSNamesystem)
at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1775)
at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1713)
at java.lang.Thread.run(Thread.java:619)


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Raghu Angadi (JIRA) at Apr 6, 2007 at 11:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487360 ]

    Raghu Angadi commented on HADOOP-1221:
    --------------------------------------


    We were looking at the the namenode code around the above trace. This is what it is doing :

    max = 100; // in this case
    for( iter = invalidateSet.iterator(); max > 0; max-- ) {
    it.remove();
    }

    invalidateSet is not actually set but ArrayList(). So if it has 500 blocks, the above loop could result in 450 blocks shifted 100 times in the array. This could be one of the things exaggerating CPU. We could use LinkedList for this and also not call it a 'Set' since that could imply to the readers that this container is a Set.

    If each it.remove() resulted in a big memmove(), do you think we should have seen more Java stuff above remove() in the stack trace?

    Next we should also capture pstack of the JVM also so that we can see what this is doing in JVM..

    Note that changing container to LinkedList might only reduce the CPU but won't fix the bug if there is any.

    high cpu usage in ReplicationMonitor thread
    --------------------------------------------

    Key: HADOOP-1221
    URL: https://issues.apache.org/jira/browse/HADOOP-1221
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: Koji Noguchi

    We had a namenode stuck in CPU 99% and it was showing a slow response time.
    (dfs.namenode.handler.count was still set to 10.)
    ReplicationMonitor thread was using the most CPU time.
    Jstack showed,
    "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@1c7b0f4d" daemon prio=10 tid=0x0000002d90690800 nid=0x4855 runnable [0x0000000041941000..0x0000000041941b30]
    java.lang.Thread.State: RUNNABLE
    at java.util.AbstractList$Itr.remove(AbstractList.java:360)
    at org.apache.hadoop.dfs.FSNamesystem.blocksToInvalidate(FSNamesystem.java:2475)
    - locked <0x0000002a9f522038> (a org.apache.hadoop.dfs.FSNamesystem)
    at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1775)
    at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1713)
    at java.lang.Thread.run(Thread.java:619)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedApr 6, '07 at 10:12p
activeApr 6, '07 at 11:36p
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Raghu Angadi (JIRA): 2 posts

People

Translate

site design / logo © 2022 Grokbase