Grokbase Groups HBase user July 2011
FAQ
Hey everyone,

We periodically see a situation where the regionserver process exists in the
process list, zookeeper thread sends the keepalive so the master won't
remove it from the active list, yet the regionserver will not serve data.

Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
testing tool.


I've taken a jstack of the process and found this:

Found one Java-level deadlock:
=============================
"IPC Server handler 99 on 60020":
waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
which is held by "IPC Server handler 64 on 60020"
"IPC Server handler 64 on 60020":
waiting for ownable synchronizer 0x00002aaab8eea130, (a
java.util.concurrent.locks.ReentrantLock$NonfairSync),
which is held by "regionserver60020.cacheFlusher"
"regionserver60020.cacheFlusher":
waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
which is held by "IPC Server handler 64 on 60020"

Java stack information for the threads listed above:
===================================================
"IPC Server handler 99 on 60020":
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434)
- waiting to lock <0x00002aaab8ef07e8> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
"IPC Server handler 64 on 60020":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaab8eea130> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
at
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435)
- locked <0x00002aaab8ef07e8> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
"regionserver60020.cacheFlusher":
at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
- waiting to lock <0x00002aaab8ef07e8> (a
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
at java.security.AccessController.doPrivileged(Native Method)
at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
at
sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
at
sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
at
sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
at java.util.TimeZone.getDisplayName(TimeZone.java:350)
at java.util.Date.toString(Date.java:1025)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
- locked <0x00002aaab8f2dc58> (a java.util.HashMap)
at
org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
- locked <0x00002aaab8894048> (a
org.apache.hadoop.hbase.regionserver.CompactSplitThread)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
- locked <0x00002aaab8894048> (a
org.apache.hadoop.hbase.regionserver.CompactSplitThread)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240)


Any ideas on how I could prevent this or let the master know about it? I've
written an app that will check all regionservers periodically for such a
lockup, but I can't run it constantly.

I can provide more of the jstack if that is helpful.

-Matt

Search Discussions

  • Lohit at Jul 14, 2011 at 9:45 pm
    Is it possible to open JIRA with full stack trace.
    Or, if you point to full stack trace one of us can open JIRA for you.
    0.90.4 will be out soon and may be we should see if there is a fix for the
    below problem?

    2011/7/14 Matt Davies <matt.davies@tynt.com>
    Hey everyone,

    We periodically see a situation where the regionserver process exists in
    the
    process list, zookeeper thread sends the keepalive so the master won't
    remove it from the active list, yet the regionserver will not serve data.

    Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
    testing tool.


    I've taken a jstack of the process and found this:

    Found one Java-level deadlock:
    =============================
    "IPC Server handler 99 on 60020":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"
    "IPC Server handler 64 on 60020":
    waiting for ownable synchronizer 0x00002aaab8eea130, (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync),
    which is held by "regionserver60020.cacheFlusher"
    "regionserver60020.cacheFlusher":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"

    Java stack information for the threads listed above:
    ===================================================
    "IPC Server handler 99 on 60020":
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "IPC Server handler 64 on 60020":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00002aaab8eea130> (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at

    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
    at

    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
    at

    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
    at

    java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
    at
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435)
    - locked <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "regionserver60020.cacheFlusher":
    at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
    at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
    at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
    at
    sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
    at
    sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
    at

    sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
    at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
    at java.util.TimeZone.getDisplayName(TimeZone.java:350)
    at java.util.Date.toString(Date.java:1025)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at

    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at

    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
    - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
    at

    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
    at

    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at

    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393)
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366)
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240)


    Any ideas on how I could prevent this or let the master know about it? I've
    written an app that will check all regionservers periodically for such a
    lockup, but I can't run it constantly.

    I can provide more of the jstack if that is helpful.

    -Matt


    --
    Have a Nice Day!
    Lohit
  • Matt Davies at Jul 14, 2011 at 9:56 pm
    Thanks. I've created HBase-4101.


    On Thu, Jul 14, 2011 at 3:44 PM, lohit wrote:

    Is it possible to open JIRA with full stack trace.
    Or, if you point to full stack trace one of us can open JIRA for you.
    0.90.4 will be out soon and may be we should see if there is a fix for the
    below problem?

    2011/7/14 Matt Davies <matt.davies@tynt.com>
    Hey everyone,

    We periodically see a situation where the regionserver process exists in
    the
    process list, zookeeper thread sends the keepalive so the master won't
    remove it from the active list, yet the regionserver will not serve data.

    Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
    testing tool.


    I've taken a jstack of the process and found this:

    Found one Java-level deadlock:
    =============================
    "IPC Server handler 99 on 60020":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"
    "IPC Server handler 64 on 60020":
    waiting for ownable synchronizer 0x00002aaab8eea130, (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync),
    which is held by "regionserver60020.cacheFlusher"
    "regionserver60020.cacheFlusher":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"

    Java stack information for the threads listed above:
    ===================================================
    "IPC Server handler 99 on 60020":
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "IPC Server handler 64 on 60020":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00002aaab8eea130> (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at

    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
    at

    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
    at

    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
    at

    java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
    at
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435)
    - locked <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "regionserver60020.cacheFlusher":
    at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
    at
    java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
    at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
    at
    sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
    at
    sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
    at

    sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
    at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
    at java.util.TimeZone.getDisplayName(TimeZone.java:350)
    at java.util.Date.toString(Date.java:1025)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at

    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at

    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
    - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
    at

    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
    at

    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at

    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393)
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366)
    at

    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240)

    Any ideas on how I could prevent this or let the master know about it? I've
    written an app that will check all regionservers periodically for such a
    lockup, but I can't run it constantly.

    I can provide more of the jstack if that is helpful.

    -Matt


    --
    Have a Nice Day!
    Lohit
  • Stack at Jul 14, 2011 at 9:59 pm
    What Lohit says but also, what jvm are you running and what options
    are you feeding it? The stack trace is a little crazy (especially the
    mix in of resource bundle loading). We saw something similar over in
    HBASE-3830 when someone was running profiler. Is that what is going
    on here?

    Thanks,
    St.Ack
    On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies wrote:
    Hey everyone,

    We periodically see a situation where the regionserver process exists in the
    process list, zookeeper thread sends the keepalive so the master won't
    remove it from the active list, yet the regionserver will not serve data.

    Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
    testing tool.


    I've taken a jstack of the process and found this:

    Found one Java-level deadlock:
    =============================
    "IPC Server handler 99 on 60020":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"
    "IPC Server handler 64 on 60020":
    waiting for ownable synchronizer 0x00002aaab8eea130, (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync),
    which is held by "regionserver60020.cacheFlusher"
    "regionserver60020.cacheFlusher":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"

    Java stack information for the threads listed above:
    ===================================================
    "IPC Server handler 99 on 60020":
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "IPC Server handler 64 on 60020":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00002aaab8eea130> (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
    at
    java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
    at
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435)
    - locked <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "regionserver60020.cacheFlusher":
    at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
    at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
    at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
    at
    sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
    at
    sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
    at
    sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
    at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
    at java.util.TimeZone.getDisplayName(TimeZone.java:350)
    at java.util.Date.toString(Date.java:1025)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
    - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240)


    Any ideas on how I could prevent this or let the master know about it? I've
    written an app that will check all regionservers periodically for such a
    lockup, but I can't run it constantly.

    I can provide more of the jstack if that is helpful.

    -Matt
  • Matt Davies at Jul 14, 2011 at 10:07 pm
    We aren't profiling right now. Here's what is in the hbase-env.sh

    export TZ="US/Mountain"
    export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
    -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log "
    export HBASE_MANAGES_ZK=false
    export HBASE_PID_DIR=/home/hadoop
    export HBASE_HEAPSIZE=10240

    Java is
    java version "1.6.0_17"
    Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

    We were planning an upgrade to 1.6.0_25 before we ran into this issue.


    On Thu, Jul 14, 2011 at 3:59 PM, Stack wrote:

    What Lohit says but also, what jvm are you running and what options
    are you feeding it? The stack trace is a little crazy (especially the
    mix in of resource bundle loading). We saw something similar over in
    HBASE-3830 when someone was running profiler. Is that what is going
    on here?

    Thanks,
    St.Ack
    On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies wrote:
    Hey everyone,

    We periodically see a situation where the regionserver process exists in the
    process list, zookeeper thread sends the keepalive so the master won't
    remove it from the active list, yet the regionserver will not serve data.

    Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
    testing tool.


    I've taken a jstack of the process and found this:

    Found one Java-level deadlock:
    =============================
    "IPC Server handler 99 on 60020":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"
    "IPC Server handler 64 on 60020":
    waiting for ownable synchronizer 0x00002aaab8eea130, (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync),
    which is held by "regionserver60020.cacheFlusher"
    "regionserver60020.cacheFlusher":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"

    Java stack information for the threads listed above:
    ===================================================
    "IPC Server handler 99 on 60020":
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "IPC Server handler 64 on 60020":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00002aaab8eea130> (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
    at
    java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
    at
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435)
    - locked <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "regionserver60020.cacheFlusher":
    at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
    at
    java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
    at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
    at
    sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
    at
    sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
    at
    sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
    at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
    at java.util.TimeZone.getDisplayName(TimeZone.java:350)
    at java.util.Date.toString(Date.java:1025)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
    - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240)

    Any ideas on how I could prevent this or let the master know about it? I've
    written an app that will check all regionservers periodically for such a
    lockup, but I can't run it constantly.

    I can provide more of the jstack if that is helpful.

    -Matt
  • Stack at Jul 14, 2011 at 10:26 pm
    Thank you.

    I've added below to issue. Will take a looksee. If issue, will
    include fix in 0.90.4.

    St.Ack
    On Thu, Jul 14, 2011 at 3:07 PM, Matt Davies wrote:
    We aren't profiling right now.  Here's what is in the hbase-env.sh

    export TZ="US/Mountain"
    export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
    -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log "
    export HBASE_MANAGES_ZK=false
    export HBASE_PID_DIR=/home/hadoop
    export HBASE_HEAPSIZE=10240

    Java is
    java version "1.6.0_17"
    Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

    We were planning an upgrade to 1.6.0_25 before we ran into this issue.


    On Thu, Jul 14, 2011 at 3:59 PM, Stack wrote:

    What Lohit says but also, what jvm are you running and what options
    are you feeding it?  The stack trace is a little crazy (especially the
    mix in of resource bundle loading).  We saw something similar over in
    HBASE-3830 when someone was running profiler.  Is that what is going
    on here?

    Thanks,
    St.Ack

    On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <matt.davies@tynt.com>
    wrote:
    Hey everyone,

    We periodically see a situation where the regionserver process exists in the
    process list, zookeeper thread sends the keepalive so the master won't
    remove it from the active list, yet the regionserver will not serve data.

    Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
    testing tool.


    I've taken a jstack of the process and found this:

    Found one Java-level deadlock:
    =============================
    "IPC Server handler 99 on 60020":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"
    "IPC Server handler 64 on 60020":
    waiting for ownable synchronizer 0x00002aaab8eea130, (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync),
    which is held by "regionserver60020.cacheFlusher"
    "regionserver60020.cacheFlusher":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8, a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"

    Java stack information for the threads listed above:
    ===================================================
    "IPC Server handler 99 on 60020":
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:434)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "IPC Server handler 64 on 60020":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00002aaab8eea130> (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
    at
    java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
    at
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:435)
    - locked <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "regionserver60020.cacheFlusher":
    at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
    at
    java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
    at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
    at
    sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
    at
    sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
    at
    sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
    at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
    at java.util.TimeZone.getDisplayName(TimeZone.java:350)
    at java.util.Date.toString(Date.java:1025)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
    - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:393)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240)

    Any ideas on how I could prevent this or let the master know about it? I've
    written an app that will check all regionservers periodically for such a
    lockup, but I can't run it constantly.

    I can provide more of the jstack if that is helpful.

    -Matt
  • Ramkrishna S Vasudevan at Jul 15, 2011 at 3:56 am
    Hi

    I think this as stack mentioned in HBASE-3830 could be due to profiler.

    But the problem is in the use of Data class. JD had once replied to the
    mailing list with the heading Re: Possible dead lock

    JD's reply
    =============================================================
    I see what you are saying, and I understand the deadlock, but what escapes
    me is why ResourceBundle has to go touch all the classes every time to find
    the locale as I see 2 threads doing the same. Maybe my understanding of what
    it does is just poor, but I also see that you are using the yourkit profiler
    so it's one more variable in the equation.

    In any case, using a Date strikes me as odd. Using a long representing
    System.currentTimeMillis is usually what we do.
    =======================================================================
    So here as per HBASE-4101 though the profiler has not run then the problem
    is the Date object called from the toString of the PriorityCompactionQueue.

    Regards
    Ram


    -----Original Message-----
    From: saint.ack@gmail.com On Behalf Of Stack
    Sent: Friday, July 15, 2011 3:56 AM
    To: user@hbase.apache.org
    Subject: Re: Deadlocked Regionserver process

    Thank you.

    I've added below to issue. Will take a looksee. If issue, will
    include fix in 0.90.4.

    St.Ack
    On Thu, Jul 14, 2011 at 3:07 PM, Matt Davies wrote:
    We aren't profiling right now.  Here's what is in the hbase-env.sh

    export TZ="US/Mountain"
    export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
    -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log "
    export HBASE_MANAGES_ZK=false
    export HBASE_PID_DIR=/home/hadoop
    export HBASE_HEAPSIZE=10240

    Java is
    java version "1.6.0_17"
    Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

    We were planning an upgrade to 1.6.0_25 before we ran into this issue.


    On Thu, Jul 14, 2011 at 3:59 PM, Stack wrote:

    What Lohit says but also, what jvm are you running and what options
    are you feeding it?  The stack trace is a little crazy (especially the
    mix in of resource bundle loading).  We saw something similar over in
    HBASE-3830 when someone was running profiler.  Is that what is going
    on here?

    Thanks,
    St.Ack

    On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <matt.davies@tynt.com>
    wrote:
    Hey everyone,

    We periodically see a situation where the regionserver process exists
    in
    the
    process list, zookeeper thread sends the keepalive so the master won't
    remove it from the active list, yet the regionserver will not serve
    data.
    Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
    testing tool.


    I've taken a jstack of the process and found this:

    Found one Java-level deadlock:
    =============================
    "IPC Server handler 99 on 60020":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
    a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"
    "IPC Server handler 64 on 60020":
    waiting for ownable synchronizer 0x00002aaab8eea130, (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync),
    which is held by "regionserver60020.cacheFlusher"
    "regionserver60020.cacheFlusher":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
    a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"

    Java stack information for the threads listed above:
    ===================================================
    "IPC Server handler 99 on 60020":
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
    emStoreFlusher.java:434)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
    2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
    .java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "IPC Server handler 64 on 60020":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00002aaab8eea130> (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(
    AbstractQueuedSynchronizer.java:747)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Abstract
    QueuedSynchronizer.java:778)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueued
    Synchronizer.java:1114)
    at
    java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java
    :186)
    at
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
    emStoreFlusher.java:435)
    - locked <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
    2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
    .java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "regionserver60020.cacheFlusher":
    at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
    at
    java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
    at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
    at
    sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
    at
    sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
    at
    sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:8
    0)
    at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
    at java.util.TimeZone.getDisplayName(TimeZone.java:350)
    at java.util.Date.toString(Date.java:1025)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionReque
    st.toString(PriorityCompactionQueue.java:114)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQ
    ueue(PriorityCompactionQueue.java:145)
    - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCom
    pactionQueue.java:188)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
    mpactSplitThread.java:140)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
    mpactSplitThread.java:118)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
    sher.java:393)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
    sher.java:366)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
    a:240)

    Any ideas on how I could prevent this or let the master know about it? I've
    written an app that will check all regionservers periodically for such
    a
    lockup, but I can't run it constantly.

    I can provide more of the jstack if that is helpful.

    -Matt
  • Ramkrishna S Vasudevan at Jul 15, 2011 at 4:34 am
    Sorry its not Data class

    But the problem is in the use of Date class. JD had once replied to the
    mailing list with the heading Re: Possible dead lock
    :)

    Regards
    Ram

    -----Original Message-----
    From: Ramkrishna S Vasudevan
    Sent: Friday, July 15, 2011 9:26 AM
    To: user@hbase.apache.org
    Subject: RE: Deadlocked Regionserver process

    Hi

    I think this as stack mentioned in HBASE-3830 could be due to profiler.

    But the problem is in the use of Data class. JD had once replied to the
    mailing list with the heading Re: Possible dead lock

    JD's reply
    =============================================================
    I see what you are saying, and I understand the deadlock, but what escapes
    me is why ResourceBundle has to go touch all the classes every time to find
    the locale as I see 2 threads doing the same. Maybe my understanding of what
    it does is just poor, but I also see that you are using the yourkit profiler
    so it's one more variable in the equation.

    In any case, using a Date strikes me as odd. Using a long representing
    System.currentTimeMillis is usually what we do.
    =======================================================================
    So here as per HBASE-4101 though the profiler has not run then the problem
    is the Date object called from the toString of the PriorityCompactionQueue.

    Regards
    Ram


    -----Original Message-----
    From: saint.ack@gmail.com On Behalf Of Stack
    Sent: Friday, July 15, 2011 3:56 AM
    To: user@hbase.apache.org
    Subject: Re: Deadlocked Regionserver process

    Thank you.

    I've added below to issue. Will take a looksee. If issue, will
    include fix in 0.90.4.

    St.Ack
    On Thu, Jul 14, 2011 at 3:07 PM, Matt Davies wrote:
    We aren't profiling right now.  Here's what is in the hbase-env.sh

    export TZ="US/Mountain"
    export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
    -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log "
    export HBASE_MANAGES_ZK=false
    export HBASE_PID_DIR=/home/hadoop
    export HBASE_HEAPSIZE=10240

    Java is
    java version "1.6.0_17"
    Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

    We were planning an upgrade to 1.6.0_25 before we ran into this issue.


    On Thu, Jul 14, 2011 at 3:59 PM, Stack wrote:

    What Lohit says but also, what jvm are you running and what options
    are you feeding it?  The stack trace is a little crazy (especially the
    mix in of resource bundle loading).  We saw something similar over in
    HBASE-3830 when someone was running profiler.  Is that what is going
    on here?

    Thanks,
    St.Ack

    On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <matt.davies@tynt.com>
    wrote:
    Hey everyone,

    We periodically see a situation where the regionserver process exists
    in
    the
    process list, zookeeper thread sends the keepalive so the master won't
    remove it from the active list, yet the regionserver will not serve
    data.
    Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
    testing tool.


    I've taken a jstack of the process and found this:

    Found one Java-level deadlock:
    =============================
    "IPC Server handler 99 on 60020":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
    a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"
    "IPC Server handler 64 on 60020":
    waiting for ownable synchronizer 0x00002aaab8eea130, (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync),
    which is held by "regionserver60020.cacheFlusher"
    "regionserver60020.cacheFlusher":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
    a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"

    Java stack information for the threads listed above:
    ===================================================
    "IPC Server handler 99 on 60020":
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
    emStoreFlusher.java:434)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
    2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
    .java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "IPC Server handler 64 on 60020":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00002aaab8eea130> (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(
    AbstractQueuedSynchronizer.java:747)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Abstract
    QueuedSynchronizer.java:778)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueued
    Synchronizer.java:1114)
    at
    java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java
    :186)
    at
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
    emStoreFlusher.java:435)
    - locked <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
    2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
    .java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "regionserver60020.cacheFlusher":
    at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
    at
    java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
    at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
    at
    sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
    at
    sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
    at
    sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:8
    0)
    at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
    at java.util.TimeZone.getDisplayName(TimeZone.java:350)
    at java.util.Date.toString(Date.java:1025)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionReque
    st.toString(PriorityCompactionQueue.java:114)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQ
    ueue(PriorityCompactionQueue.java:145)
    - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCom
    pactionQueue.java:188)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
    mpactSplitThread.java:140)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
    mpactSplitThread.java:118)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
    sher.java:393)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
    sher.java:366)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
    a:240)

    Any ideas on how I could prevent this or let the master know about it? I've
    written an app that will check all regionservers periodically for such
    a
    lockup, but I can't run it constantly.

    I can provide more of the jstack if that is helpful.

    -Matt
  • Stack at Jul 15, 2011 at 6:03 am
    Add a comment to the issue Ram. Use of heavy-weight Date seems odd for sure.
    St.Ack

    On Thu, Jul 14, 2011 at 9:33 PM, Ramkrishna S Vasudevan
    wrote:
    Sorry its not Data class

    But the problem is in the use of Date class.  JD had once replied to the
    mailing list with the heading Re: Possible dead lock
    :)

    Regards
    Ram

    -----Original Message-----
    From: Ramkrishna S Vasudevan
    Sent: Friday, July 15, 2011 9:26 AM
    To: user@hbase.apache.org
    Subject: RE: Deadlocked Regionserver process

    Hi

    I think this as stack mentioned in HBASE-3830 could be due to profiler.

    But the problem is in the use of Data class.  JD had once replied to the
    mailing list with the heading Re: Possible dead lock

    JD's reply
    =============================================================
    I see what you are saying, and I understand the deadlock, but what escapes
    me is why ResourceBundle has to go touch all the classes every time to find
    the locale as I see 2 threads doing the same. Maybe my understanding of what
    it does is just poor, but I also see that you are using the yourkit profiler
    so it's one more variable in the equation.

    In any case, using a Date strikes me as odd. Using a long representing
    System.currentTimeMillis is usually what we do.
    =======================================================================
    So here as per HBASE-4101 though the profiler has not run then the problem
    is the Date object called from the toString of the PriorityCompactionQueue.

    Regards
    Ram


    -----Original Message-----
    From: saint.ack@gmail.com On Behalf Of Stack
    Sent: Friday, July 15, 2011 3:56 AM
    To: user@hbase.apache.org
    Subject: Re: Deadlocked Regionserver process

    Thank you.

    I've added below to issue.  Will take a looksee.  If issue, will
    include fix in 0.90.4.

    St.Ack
    On Thu, Jul 14, 2011 at 3:07 PM, Matt Davies wrote:
    We aren't profiling right now.  Here's what is in the hbase-env.sh

    export TZ="US/Mountain"
    export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
    -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails
    -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log "
    export HBASE_MANAGES_ZK=false
    export HBASE_PID_DIR=/home/hadoop
    export HBASE_HEAPSIZE=10240

    Java is
    java version "1.6.0_17"
    Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

    We were planning an upgrade to 1.6.0_25 before we ran into this issue.


    On Thu, Jul 14, 2011 at 3:59 PM, Stack wrote:

    What Lohit says but also, what jvm are you running and what options
    are you feeding it?  The stack trace is a little crazy (especially the
    mix in of resource bundle loading).  We saw something similar over in
    HBASE-3830 when someone was running profiler.  Is that what is going
    on here?

    Thanks,
    St.Ack

    On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <matt.davies@tynt.com>
    wrote:
    Hey everyone,

    We periodically see a situation where the regionserver process exists
    in
    the
    process list, zookeeper thread sends the keepalive so the master won't
    remove it from the active list, yet the regionserver will not serve
    data.
    Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal
    testing tool.


    I've taken a jstack of the process and found this:

    Found one Java-level deadlock:
    =============================
    "IPC Server handler 99 on 60020":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
    a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"
    "IPC Server handler 64 on 60020":
    waiting for ownable synchronizer 0x00002aaab8eea130, (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync),
    which is held by "regionserver60020.cacheFlusher"
    "regionserver60020.cacheFlusher":
    waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
    a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
    which is held by "IPC Server handler 64 on 60020"

    Java stack information for the threads listed above:
    ===================================================
    "IPC Server handler 99 on 60020":
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
    emStoreFlusher.java:434)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
    2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
    .java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "IPC Server handler 64 on 60020":
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00002aaab8eea130> (a
    java.util.concurrent.locks.ReentrantLock$NonfairSync)
    at
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(
    AbstractQueuedSynchronizer.java:747)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Abstract
    QueuedSynchronizer.java:778)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueued
    Synchronizer.java:1114)
    at
    java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java
    :186)
    at
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
    emStoreFlusher.java:435)
    - locked <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
    2529)
    at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
    .java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
    at
    org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
    "regionserver60020.cacheFlusher":
    at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
    - waiting to lock <0x00002aaab8ef07e8> (a
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
    at
    java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
    at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
    at
    sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
    at
    sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
    at
    sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:8
    0)
    at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
    at java.util.TimeZone.getDisplayName(TimeZone.java:350)
    at java.util.Date.toString(Date.java:1025)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionReque
    st.toString(PriorityCompactionQueue.java:114)
    at java.lang.String.valueOf(String.java:2826)
    at java.lang.StringBuilder.append(StringBuilder.java:115)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQ
    ueue(PriorityCompactionQueue.java:145)
    - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
    at
    org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCom
    pactionQueue.java:188)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
    mpactSplitThread.java:140)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
    mpactSplitThread.java:118)
    - locked <0x00002aaab8894048> (a
    org.apache.hadoop.hbase.regionserver.CompactSplitThread)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
    sher.java:393)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
    sher.java:366)
    at
    org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
    a:240)

    Any ideas on how I could prevent this or let the master know about it? I've
    written an app that will check all regionservers periodically for such
    a
    lockup, but I can't run it constantly.

    I can provide more of the jstack if that is helpful.

    -Matt

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJul 14, '11 at 6:37p
activeJul 15, '11 at 6:03a
posts9
users4
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase