BTW
The timeout (when calling flushCommits) happened midnight, so I didn't
capture jstack.
In hadoop1 region server log, I see this around time of timeout in 4th run:
2011-02-13 08:25:01,015 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting, commencing flushing stores
2011-02-13 08:25:01,016 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call flushRegion(REGION => {NAME =>
'NIGHTLYDEVGRIDSGRIDSQL-THREEGPPSPEECHCALLS-1297583809865,2>&U\xF6\xB582>&U\xF6\xB582>&U\xF6\xB582>&U\xF6\xB582>&T,1297583814638.8cb772d452dee232306dfab0b472ec9a.',
STARTKEY => '2>&U\xF6\xB582>&U\xF6\xB582>&U\xF6\xB582>&U\xF6\xB582>&T',
ENDKEY =>
'2\xC1\xA3\xDFhVz2\xC1\xA3\xDFhVz2\xC1\xA3\xDFhVz2\xC1\xA3\xDFhVz2\xC1\xA3\xDD',
ENCODED => 8cb772d452dee232306dfab0b472ec9a, TABLE => {{NAME =>
'NIGHTLYDEVGRIDSGRIDSQL-THREEGPPSPEECHCALLS-1297583809865', FAMILIES =>
[{NAME => 'd', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS =>
'2', COMPRESSION => 'GZ', TTL => '31536000', BLOCKSIZE => '65536', IN_MEMORY
=> 'false', BLOCKCACHE => 'false'}, {NAME => 'i', BLOOMFILTER => 'ROW',
REPLICATION_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'GZ', TTL =>
'31536000', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'false'}, {NAME => 'v', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',
VERSIONS => '2', COMPRESSION => 'GZ', TTL => '31536000', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}) from
10.202.50.76:62489: output error
2011-02-13 08:25:01,020 WARN org.apache.hadoop.ipc.HBaseServer: PRI IPC
Server handler 3 on 60020 caught: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
On Thu, Feb 10, 2011 at 2:41 PM, Ted Yu wrote:I replaced hbase jar with hbase-0.90.1.jar
I also upgraded client side jar to hbase-0.90.1.jar
Our map tasks were running faster than before for about 50 minutes.
However, map tasks then timed out calling flushCommits(). This happened even
after fresh restart of hbase.
I don't see any exception in region server logs.
In master log, I found:
2011-02-10 18:24:15,286 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region
-ROOT-,,0.70236052 on sjc1-hadoop6.X.com,60020,1297362251595
2011-02-10 18:24:15,349 INFO
org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification of
.META.,,1 at address=null;
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
.META.,,1
2011-02-10 18:24:15,350 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x12e10d0e31e0000 Creating (or updating) unassigned node for
1028785192 with OFFLINE state
I am attaching region server (which didn't respond to stop-hbase.sh)
jstack.
FYI
On Thu, Feb 10, 2011 at 10:10 AM, Stack wrote:
Thats probably enough Ted. The 0.90.1 hbase-default.xml has an extra
config. to enable the experimental HBASE-3455 feature but you can copy
that over if you want to try playing with it (it defaults off so you'd
copy over the config. if you wanted to set it to true).
St.Ack