FAQ
Hi,
We've run some test on a three node hbase cluster:
1 node running Master and Region Server
2 nodes runnig Region Servers
Hbase 0.2
Hadoop 0.17.1

The test was to put some data in an endless loop
the the data row key was:
aaa + timestamp
aab + timestamp
aac + timestamp
....
aaz + timestamp
aba + timestamp
....
zzz + timestamp
and so on...

I could insert up to 7000 rows per second
and it all went well up to a time when hbase masters started to throwing:

08/09/10 07:09:28 INFO ipc.Client: Retrying connect to server:
/192.168.1.202:60020. Already tried 1 time(s).
08/09/10 07:09:29 INFO ipc.Client: Retrying connect to server:
/192.168.1.202:60020. Already tried 2 time(s).
08/09/10 07:09:30 INFO ipc.Client: Retrying connect to server:
/192.168.1.202:60020. Already tried 3 time(s).
08/09/10 07:09:31 INFO ipc.Client: Retrying connect to server:
/192.168.1.202:60020. Already tried 4 time(s).
....


then the whole process became very slow
we could notice that beteween these "Retrying connect" messages Hbase
managed to insert some data:

08/09/10 07:13:37 INFO ipc.Client: Retrying connect to server:
/192.168.1.201:60020. Already tried 10 time(s).
08/09/10 07:07:43 data with row id mvy_1220636795411 INSERTED
08/09/10 07:42:59 INFO ipc.Client: Retrying connect to server:
/192.168.1.202:60020. Already tried 1 time(s).


After examining hbase logs at a debug level we've found that at some
Hbase started to make some parallel Splits at the same time:

2008-09-10 07:03:42,251 INFO
org.apache.hadoop.hbase.master.ServerManager: Received
MSG_REPORT_SPLIT: test_log,nre_1220650334054,1220684196714:
test_log,nre_1220650334054,1220684196714 split; daughters:
test_log,nre_1220650334054,1221023007756,
test_log,nwm_1220659679240,1221023007756 from 192.168.1.201:60020
2008-09-10 07:03:42,251 INFO
org.apache.hadoop.hbase.master.RegionManager: assigning region
test_log,nwm_1220659679240,1221023007756 to server 192.168.1.201:60020
2008-09-10 07:03:42,388 INFO
org.apache.hadoop.hbase.master.ServerManager: Received
MSG_REPORT_SPLIT: test_log,vlz_1220657647003,1220683727773:
test_log,vlz_1220657647003,1220683727773 split; daughters:
test_log,vlz_1220657647003,1221023049055,
test_log,vri_1220892338621,1221023049055 from 192.168.1.202:60020
2008-09-10 07:03:42,399 INFO
org.apache.hadoop.hbase.master.RegionManager: assigning region
test_log,nre_1220650334054,1221023007756 to server 192.168.1.201:60020


And then after one of the splits it threw: IOException

2008-09-10 07:06:18,591 INFO
org.apache.hadoop.hbase.master.ServerManager: Received
MSG_REPORT_SPLIT: test_log,mvy_1220633069016,1220684186307:
test_log,mvy_1220633069016,1220684186307 split; daughters:
test_log,mvy_1220633069016,1221023269764,
test_log,nbg_1220636795411,1221023269764 from 192.168.1.202:60020
2008-09-10 07:06:18,592 INFO
org.apache.hadoop.hbase.master.RegionManager: assigning region
test_log,nbg_1220636795411,1221023269764 to server 192.168.1.202:60020
2008-09-10 07:06:18,593 INFO
org.apache.hadoop.hbase.master.RegionManager: assigning region
test_log,mvy_1220633069016,1221023269764 to server 192.168.1.202:60020
2008-09-10 07:06:21,600 INFO
org.apache.hadoop.hbase.master.ServerManager: Received
MSG_REPORT_PROCESS_OPEN: test_log,mvy_1220633069016,1221023269764
from 192.168.1.202:60020
2008-09-10 07:06:21,601 INFO
org.apache.hadoop.hbase.master.ServerManager: Received
MSG_REPORT_OPEN: test_log,nbg_1220636795411,1221023269764 from
192.168.1.202:60020
2008-09-10 07:06:21,601 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
test_log,nbg_1220636795411,1221023269764 open on 192.168.1.202:60020
2008-09-10 07:06:21,601 INFO
org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
test_log,nbg_1220636795411,1221023269764 in region .META.,,1 with
startcode 1220950604250 and server 192.168.1.202:60020
2008-09-10 07:06:30,609 INFO
org.apache.hadoop.hbase.master.ServerManager: Received
MSG_REPORT_CLOSE: test_log,mvy_1220633069016,1221023269764:
java.io.IOException: Could not obtain block: blk_4797637782612180770
file=/hbase/test_log/2024359288/message/info/1192041150217141868.1504866925
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1430)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1281)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1385)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
at
java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
at java.io.DataInputStream.readUTF(DataInputStream.java:572)
at java.io.DataInputStream.readUTF(DataInputStream.java:547)
at
org.apache.hadoop.hbase.regionserver.HStoreFile$Reference.readFields(HStoreFile.java:573)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.readSplitInfo(HStoreFile.java:292)
at
org.apache.hadoop.hbase.regionserver.HStore.loadHStoreFiles(HStore.java:388)
at
org.apache.hadoop.hbase.regionserver.HStore.(HRegion.java:1653)
at
org.apache.hadoop.hbase.regionserver.HRegion.(HRegionServer.java:902)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:877)
at
org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:817)
at java.lang.Thread.run(Thread.java:619)
from 192.168.1.202:60020

And then it threw "Connection refused" while scanning ROOT region

2008-09-10 07:06:55,706 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: /192.168.1.202:60020. Already tried 10 time(s).
2008-09-10 07:06:56,707 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: yiyi03/192.168.1.203:60020. Already tried 1 time(s).
2008-09-10 07:06:56,739 WARN
org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
at org.apache.hadoop.ipc.Client.call(Client.java:546)
at
org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:230)
at $Proxy2.openScanner(Unknown Source)
at
org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
at
org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:48)
at
org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:74)
at
org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
at org.apache.hadoop.hbase.Chore.run(Chore.java:63)

What could be a cause? Is it a well known issue? Some misconfiguration?


Krzysiek

Search Discussions

  • Jean-Daniel Cryans at Sep 10, 2008 at 1:03 pm
    Krzysztof,

    HBase is not known to be able to take a lot of inserts when starting of a
    fresh table since there is no load distribution. To be able to help you, we
    would need to know:

    - The hardware of each machine (very important)
    - The datanodes logs when it failed
    - If your load made the machines swap

    You should probably also use the current branch 0.2, it is more stable (you
    will also have to upgrade to hadoop 0.17.2.1)

    Thanks,

    J-D

    2008/9/10 Krzysztof Szlapinski <krzysztof.szlapinski@starline.hk>
    Hi,
    We've run some test on a three node hbase cluster:
    1 node running Master and Region Server
    2 nodes runnig Region Servers
    Hbase 0.2
    Hadoop 0.17.1

    The test was to put some data in an endless loop
    the the data row key was:
    aaa + timestamp
    aab + timestamp
    aac + timestamp
    ....
    aaz + timestamp
    aba + timestamp
    ....
    zzz + timestamp
    and so on...

    I could insert up to 7000 rows per second
    and it all went well up to a time when hbase masters started to throwing:

    08/09/10 07:09:28 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 1 time(s).
    08/09/10 07:09:29 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 2 time(s).
    08/09/10 07:09:30 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 3 time(s).
    08/09/10 07:09:31 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 4 time(s).
    ....


    then the whole process became very slow
    we could notice that beteween these "Retrying connect" messages Hbase
    managed to insert some data:

    08/09/10 07:13:37 INFO ipc.Client: Retrying connect to server:
    /192.168.1.201:60020. Already tried 10 time(s).
    08/09/10 07:07:43 data with row id mvy_1220636795411 INSERTED
    08/09/10 07:42:59 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 1 time(s).


    After examining hbase logs at a debug level we've found that at some Hbase
    started to make some parallel Splits at the same time:

    2008-09-10 07:03:42,251 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,nre_1220650334054,1220684196714:
    test_log,nre_1220650334054,1220684196714 split; daughters:
    test_log,nre_1220650334054,1221023007756,
    test_log,nwm_1220659679240,1221023007756 from 192.168.1.201:60020
    2008-09-10 07:03:42,251 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nwm_1220659679240,1221023007756 to server 192.168.1.201:60020
    2008-09-10 07:03:42,388 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,vlz_1220657647003,1220683727773:
    test_log,vlz_1220657647003,1220683727773 split; daughters:
    test_log,vlz_1220657647003,1221023049055,
    test_log,vri_1220892338621,1221023049055 from 192.168.1.202:60020
    2008-09-10 07:03:42,399 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nre_1220650334054,1221023007756 to server 192.168.1.201:60020


    And then after one of the splits it threw: IOException

    2008-09-10 07:06:18,591 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,mvy_1220633069016,1220684186307:
    test_log,mvy_1220633069016,1220684186307 split; daughters:
    test_log,mvy_1220633069016,1221023269764,
    test_log,nbg_1220636795411,1221023269764 from 192.168.1.202:60020
    2008-09-10 07:06:18,592 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nbg_1220636795411,1221023269764 to server 192.168.1.202:60020
    2008-09-10 07:06:18,593 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,mvy_1220633069016,1221023269764 to server 192.168.1.202:60020
    2008-09-10 07:06:21,600 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_PROCESS_OPEN: test_log,mvy_1220633069016,1221023269764
    from 192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_OPEN: test_log,nbg_1220636795411,1221023269764 from
    192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
    test_log,nbg_1220636795411,1221023269764 open on 192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
    test_log,nbg_1220636795411,1221023269764 in region .META.,,1 with
    startcode 1220950604250 and server 192.168.1.202:60020
    2008-09-10 07:06:30,609 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_CLOSE: test_log,mvy_1220633069016,1221023269764:
    java.io.IOException: Could not obtain block: blk_4797637782612180770

    file=/hbase/test_log/2024359288/message/info/1192041150217141868.1504866925
    at

    org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1430)
    at

    org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1281)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1385)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
    at
    java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
    at java.io.DataInputStream.readUTF(DataInputStream.java:572)
    at java.io.DataInputStream.readUTF(DataInputStream.java:547)
    at

    org.apache.hadoop.hbase.regionserver.HStoreFile$Reference.readFields(HStoreFile.java:573)
    at

    org.apache.hadoop.hbase.regionserver.HStoreFile.readSplitInfo(HStoreFile.java:292)
    at

    org.apache.hadoop.hbase.regionserver.HStore.loadHStoreFiles(HStore.java:388)
    at
    org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:218)
    at

    org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1653)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:470)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:902)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:877)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:817)
    at java.lang.Thread.run(Thread.java:619)
    from 192.168.1.202:60020

    And then it threw "Connection refused" while scanning ROOT region

    2008-09-10 07:06:55,706 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: /192.168.1.202:60020. Already tried 10 time(s).
    2008-09-10 07:06:56,707 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: yiyi03/192.168.1.203:60020. Already tried 1 time(s).
    2008-09-10 07:06:56,739 WARN
    org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
    java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
    sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
    at org.apache.hadoop.ipc.Client.call(Client.java:546)
    at
    org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:230)
    at $Proxy2.openScanner(Unknown Source)
    at

    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
    at
    org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:48)
    at

    org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:74)
    at
    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
    at org.apache.hadoop.hbase.Chore.run(Chore.java:63)

    What could be a cause? Is it a well known issue? Some misconfiguration?


    Krzysiek


  • Stack at Sep 10, 2008 at 7:18 pm
    What J-D said plus, make sure you've upped the file descriptor ulimit
    (see FAQ for how). HBase opens loads of files in HDFS. May be hitting
    limits (Grep datanode logs for 'too many open files' or some such
    message in exceptions).

    St.Ack

    Jean-Daniel Cryans wrote:
    Krzysztof,

    HBase is not known to be able to take a lot of inserts when starting of a
    fresh table since there is no load distribution. To be able to help you, we
    would need to know:

    - The hardware of each machine (very important)
    - The datanodes logs when it failed
    - If your load made the machines swap

    You should probably also use the current branch 0.2, it is more stable (you
    will also have to upgrade to hadoop 0.17.2.1)

    Thanks,

    J-D

    2008/9/10 Krzysztof Szlapinski <krzysztof.szlapinski@starline.hk>

    Hi,
    We've run some test on a three node hbase cluster:
    1 node running Master and Region Server
    2 nodes runnig Region Servers
    Hbase 0.2
    Hadoop 0.17.1

    The test was to put some data in an endless loop
    the the data row key was:
    aaa + timestamp
    aab + timestamp
    aac + timestamp
    ....
    aaz + timestamp
    aba + timestamp
    ....
    zzz + timestamp
    and so on...

    I could insert up to 7000 rows per second
    and it all went well up to a time when hbase masters started to throwing:

    08/09/10 07:09:28 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 1 time(s).
    08/09/10 07:09:29 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 2 time(s).
    08/09/10 07:09:30 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 3 time(s).
    08/09/10 07:09:31 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 4 time(s).
    ....


    then the whole process became very slow
    we could notice that beteween these "Retrying connect" messages Hbase
    managed to insert some data:

    08/09/10 07:13:37 INFO ipc.Client: Retrying connect to server:
    /192.168.1.201:60020. Already tried 10 time(s).
    08/09/10 07:07:43 data with row id mvy_1220636795411 INSERTED
    08/09/10 07:42:59 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 1 time(s).


    After examining hbase logs at a debug level we've found that at some Hbase
    started to make some parallel Splits at the same time:

    2008-09-10 07:03:42,251 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,nre_1220650334054,1220684196714:
    test_log,nre_1220650334054,1220684196714 split; daughters:
    test_log,nre_1220650334054,1221023007756,
    test_log,nwm_1220659679240,1221023007756 from 192.168.1.201:60020
    2008-09-10 07:03:42,251 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nwm_1220659679240,1221023007756 to server 192.168.1.201:60020
    2008-09-10 07:03:42,388 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,vlz_1220657647003,1220683727773:
    test_log,vlz_1220657647003,1220683727773 split; daughters:
    test_log,vlz_1220657647003,1221023049055,
    test_log,vri_1220892338621,1221023049055 from 192.168.1.202:60020
    2008-09-10 07:03:42,399 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nre_1220650334054,1221023007756 to server 192.168.1.201:60020


    And then after one of the splits it threw: IOException

    2008-09-10 07:06:18,591 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,mvy_1220633069016,1220684186307:
    test_log,mvy_1220633069016,1220684186307 split; daughters:
    test_log,mvy_1220633069016,1221023269764,
    test_log,nbg_1220636795411,1221023269764 from 192.168.1.202:60020
    2008-09-10 07:06:18,592 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nbg_1220636795411,1221023269764 to server 192.168.1.202:60020
    2008-09-10 07:06:18,593 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,mvy_1220633069016,1221023269764 to server 192.168.1.202:60020
    2008-09-10 07:06:21,600 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_PROCESS_OPEN: test_log,mvy_1220633069016,1221023269764
    from 192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_OPEN: test_log,nbg_1220636795411,1221023269764 from
    192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
    test_log,nbg_1220636795411,1221023269764 open on 192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
    test_log,nbg_1220636795411,1221023269764 in region .META.,,1 with
    startcode 1220950604250 and server 192.168.1.202:60020
    2008-09-10 07:06:30,609 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_CLOSE: test_log,mvy_1220633069016,1221023269764:
    java.io.IOException: Could not obtain block: blk_4797637782612180770

    file=/hbase/test_log/2024359288/message/info/1192041150217141868.1504866925
    at

    org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1430)
    at

    org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1281)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1385)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
    at
    java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
    at java.io.DataInputStream.readUTF(DataInputStream.java:572)
    at java.io.DataInputStream.readUTF(DataInputStream.java:547)
    at

    org.apache.hadoop.hbase.regionserver.HStoreFile$Reference.readFields(HStoreFile.java:573)
    at

    org.apache.hadoop.hbase.regionserver.HStoreFile.readSplitInfo(HStoreFile.java:292)
    at

    org.apache.hadoop.hbase.regionserver.HStore.loadHStoreFiles(HStore.java:388)
    at
    org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:218)
    at

    org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1653)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:470)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:902)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:877)
    at

    org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:817)
    at java.lang.Thread.run(Thread.java:619)
    from 192.168.1.202:60020

    And then it threw "Connection refused" while scanning ROOT region

    2008-09-10 07:06:55,706 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: /192.168.1.202:60020. Already tried 10 time(s).
    2008-09-10 07:06:56,707 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: yiyi03/192.168.1.203:60020. Already tried 1 time(s).
    2008-09-10 07:06:56,739 WARN
    org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
    java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
    sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
    at org.apache.hadoop.ipc.Client.call(Client.java:546)
    at
    org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:230)
    at $Proxy2.openScanner(Unknown Source)
    at

    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
    at
    org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:48)
    at

    org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:74)
    at
    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
    at org.apache.hadoop.hbase.Chore.run(Chore.java:63)

    What could be a cause? Is it a well known issue? Some misconfiguration?


    Krzysiek



  • Ding, Hui at Sep 10, 2008 at 8:55 pm
    I thought hbase has load-balancing (at least Bigtable has). So how can
    one distribute the data properly when bulking loading the data?

    Thanks,
    Hui

    -----Original Message-----
    From: stack
    Sent: Wednesday, September 10, 2008 12:17 PM
    To: hbase-user@hadoop.apache.org
    Subject: Re: Could not obtain block

    What J-D said plus, make sure you've upped the file descriptor ulimit
    (see FAQ for how). HBase opens loads of files in HDFS. May be hitting

    limits (Grep datanode logs for 'too many open files' or some such
    message in exceptions).

    St.Ack

    Jean-Daniel Cryans wrote:
    Krzysztof,

    HBase is not known to be able to take a lot of inserts when starting of a
    fresh table since there is no load distribution. To be able to help you, we
    would need to know:

    - The hardware of each machine (very important)
    - The datanodes logs when it failed
    - If your load made the machines swap

    You should probably also use the current branch 0.2, it is more stable (you
    will also have to upgrade to hadoop 0.17.2.1)

    Thanks,

    J-D

    2008/9/10 Krzysztof Szlapinski <krzysztof.szlapinski@starline.hk>

    Hi,
    We've run some test on a three node hbase cluster:
    1 node running Master and Region Server
    2 nodes runnig Region Servers
    Hbase 0.2
    Hadoop 0.17.1

    The test was to put some data in an endless loop
    the the data row key was:
    aaa + timestamp
    aab + timestamp
    aac + timestamp
    ....
    aaz + timestamp
    aba + timestamp
    ....
    zzz + timestamp
    and so on...

    I could insert up to 7000 rows per second
    and it all went well up to a time when hbase masters started to
    throwing:
    08/09/10 07:09:28 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 1 time(s).
    08/09/10 07:09:29 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 2 time(s).
    08/09/10 07:09:30 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 3 time(s).
    08/09/10 07:09:31 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 4 time(s).
    ....


    then the whole process became very slow
    we could notice that beteween these "Retrying connect" messages Hbase
    managed to insert some data:

    08/09/10 07:13:37 INFO ipc.Client: Retrying connect to server:
    /192.168.1.201:60020. Already tried 10 time(s).
    08/09/10 07:07:43 data with row id mvy_1220636795411 INSERTED
    08/09/10 07:42:59 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 1 time(s).


    After examining hbase logs at a debug level we've found that at some
    Hbase
    started to make some parallel Splits at the same time:

    2008-09-10 07:03:42,251 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,nre_1220650334054,1220684196714:
    test_log,nre_1220650334054,1220684196714 split; daughters:
    test_log,nre_1220650334054,1221023007756,
    test_log,nwm_1220659679240,1221023007756 from 192.168.1.201:60020
    2008-09-10 07:03:42,251 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nwm_1220659679240,1221023007756 to server
    192.168.1.201:60020
    2008-09-10 07:03:42,388 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,vlz_1220657647003,1220683727773:
    test_log,vlz_1220657647003,1220683727773 split; daughters:
    test_log,vlz_1220657647003,1221023049055,
    test_log,vri_1220892338621,1221023049055 from 192.168.1.202:60020
    2008-09-10 07:03:42,399 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nre_1220650334054,1221023007756 to server
    192.168.1.201:60020

    And then after one of the splits it threw: IOException

    2008-09-10 07:06:18,591 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,mvy_1220633069016,1220684186307:
    test_log,mvy_1220633069016,1220684186307 split; daughters:
    test_log,mvy_1220633069016,1221023269764,
    test_log,nbg_1220636795411,1221023269764 from 192.168.1.202:60020
    2008-09-10 07:06:18,592 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nbg_1220636795411,1221023269764 to server
    192.168.1.202:60020
    2008-09-10 07:06:18,593 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,mvy_1220633069016,1221023269764 to server
    192.168.1.202:60020
    2008-09-10 07:06:21,600 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_PROCESS_OPEN: test_log,mvy_1220633069016,1221023269764
    from 192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_OPEN: test_log,nbg_1220636795411,1221023269764 from
    192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
    test_log,nbg_1220636795411,1221023269764 open on
    192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
    test_log,nbg_1220636795411,1221023269764 in region .META.,,1 with
    startcode 1220950604250 and server 192.168.1.202:60020
    2008-09-10 07:06:30,609 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_CLOSE: test_log,mvy_1220633069016,1221023269764:
    java.io.IOException: Could not obtain block:
    blk_4797637782612180770
    file=/hbase/test_log/2024359288/message/info/1192041150217141868.1504866
    925
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.
    java:1430)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.jav
    a:1281)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1385)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
    at
    java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
    at java.io.DataInputStream.readUTF(DataInputStream.java:572)
    at java.io.DataInputStream.readUTF(DataInputStream.java:547)
    at
    org.apache.hadoop.hbase.regionserver.HStoreFile$Reference.readFields(HSt
    oreFile.java:573)
    at
    org.apache.hadoop.hbase.regionserver.HStoreFile.readSplitInfo(HStoreFile
    .java:292)
    at
    org.apache.hadoop.hbase.regionserver.HStore.loadHStoreFiles(HStore.java:
    388)
    at
    org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:218)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.j
    ava:1653)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:470)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRe
    gionServer.java:902)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionSer
    ver.java:877)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionSer
    ver.java:817)
    at java.lang.Thread.run(Thread.java:619)
    from 192.168.1.202:60020

    And then it threw "Connection refused" while scanning ROOT region

    2008-09-10 07:06:55,706 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: /192.168.1.202:60020. Already tried 10 time(s).
    2008-09-10 07:06:56,707 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: yiyi03/192.168.1.203:60020. Already tried 1
    time(s).
    2008-09-10 07:06:56,739 WARN
    org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
    java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
    sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
    at org.apache.hadoop.ipc.Client.call(Client.java:546)
    at
    org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:230)
    at $Proxy2.openScanner(Unknown Source)
    at
    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:1
    59)
    at
    org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:48)
    at
    org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.j
    ava:74)
    at
    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
    at org.apache.hadoop.hbase.Chore.run(Chore.java:63)

    What could be a cause? Is it a well known issue? Some
    misconfiguration?

    Krzysiek



  • Jean-Daniel Cryans at Sep 10, 2008 at 9:01 pm
    Hui,

    I was referring that when you create a new table, it only has 1 region
    (Bigtable's tablet). The more the regions split afterwards, the more
    distribution you have. I'm pretty sure Bigtable has (or had) the same kind
    of limitations.

    J-D
    On Wed, Sep 10, 2008 at 4:46 PM, Ding, Hui wrote:

    I thought hbase has load-balancing (at least Bigtable has). So how can
    one distribute the data properly when bulking loading the data?

    Thanks,
    Hui

    -----Original Message-----
    From: stack
    Sent: Wednesday, September 10, 2008 12:17 PM
    To: hbase-user@hadoop.apache.org
    Subject: Re: Could not obtain block

    What J-D said plus, make sure you've upped the file descriptor ulimit
    (see FAQ for how). HBase opens loads of files in HDFS. May be hitting

    limits (Grep datanode logs for 'too many open files' or some such
    message in exceptions).

    St.Ack

    Jean-Daniel Cryans wrote:
    Krzysztof,

    HBase is not known to be able to take a lot of inserts when starting of a
    fresh table since there is no load distribution. To be able to help you, we
    would need to know:

    - The hardware of each machine (very important)
    - The datanodes logs when it failed
    - If your load made the machines swap

    You should probably also use the current branch 0.2, it is more stable (you
    will also have to upgrade to hadoop 0.17.2.1)

    Thanks,

    J-D

    2008/9/10 Krzysztof Szlapinski <krzysztof.szlapinski@starline.hk>

    Hi,
    We've run some test on a three node hbase cluster:
    1 node running Master and Region Server
    2 nodes runnig Region Servers
    Hbase 0.2
    Hadoop 0.17.1

    The test was to put some data in an endless loop
    the the data row key was:
    aaa + timestamp
    aab + timestamp
    aac + timestamp
    ....
    aaz + timestamp
    aba + timestamp
    ....
    zzz + timestamp
    and so on...

    I could insert up to 7000 rows per second
    and it all went well up to a time when hbase masters started to
    throwing:
    08/09/10 07:09:28 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 1 time(s).
    08/09/10 07:09:29 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 2 time(s).
    08/09/10 07:09:30 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 3 time(s).
    08/09/10 07:09:31 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 4 time(s).
    ....


    then the whole process became very slow
    we could notice that beteween these "Retrying connect" messages Hbase
    managed to insert some data:

    08/09/10 07:13:37 INFO ipc.Client: Retrying connect to server:
    /192.168.1.201:60020. Already tried 10 time(s).
    08/09/10 07:07:43 data with row id mvy_1220636795411 INSERTED
    08/09/10 07:42:59 INFO ipc.Client: Retrying connect to server:
    /192.168.1.202:60020. Already tried 1 time(s).


    After examining hbase logs at a debug level we've found that at some
    Hbase
    started to make some parallel Splits at the same time:

    2008-09-10 07:03:42,251 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,nre_1220650334054,1220684196714:
    test_log,nre_1220650334054,1220684196714 split; daughters:
    test_log,nre_1220650334054,1221023007756,
    test_log,nwm_1220659679240,1221023007756 from 192.168.1.201:60020
    2008-09-10 07:03:42,251 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nwm_1220659679240,1221023007756 to server
    192.168.1.201:60020
    2008-09-10 07:03:42,388 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,vlz_1220657647003,1220683727773:
    test_log,vlz_1220657647003,1220683727773 split; daughters:
    test_log,vlz_1220657647003,1221023049055,
    test_log,vri_1220892338621,1221023049055 from 192.168.1.202:60020
    2008-09-10 07:03:42,399 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nre_1220650334054,1221023007756 to server
    192.168.1.201:60020

    And then after one of the splits it threw: IOException

    2008-09-10 07:06:18,591 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_SPLIT: test_log,mvy_1220633069016,1220684186307:
    test_log,mvy_1220633069016,1220684186307 split; daughters:
    test_log,mvy_1220633069016,1221023269764,
    test_log,nbg_1220636795411,1221023269764 from 192.168.1.202:60020
    2008-09-10 07:06:18,592 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,nbg_1220636795411,1221023269764 to server
    192.168.1.202:60020
    2008-09-10 07:06:18,593 INFO
    org.apache.hadoop.hbase.master.RegionManager: assigning region
    test_log,mvy_1220633069016,1221023269764 to server
    192.168.1.202:60020
    2008-09-10 07:06:21,600 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_PROCESS_OPEN: test_log,mvy_1220633069016,1221023269764
    from 192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_OPEN: test_log,nbg_1220636795411,1221023269764 from
    192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
    test_log,nbg_1220636795411,1221023269764 open on
    192.168.1.202:60020
    2008-09-10 07:06:21,601 INFO
    org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
    test_log,nbg_1220636795411,1221023269764 in region .META.,,1 with
    startcode 1220950604250 and server 192.168.1.202:60020
    2008-09-10 07:06:30,609 INFO
    org.apache.hadoop.hbase.master.ServerManager: Received
    MSG_REPORT_CLOSE: test_log,mvy_1220633069016,1221023269764:
    java.io.IOException: Could not obtain block:
    blk_4797637782612180770
    file=/hbase/test_log/2024359288/message/info/1192041150217141868.1504866
    925
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.
    java:1430)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.jav
    a:1281)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1385)
    at
    org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
    at
    java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
    at java.io.DataInputStream.readUTF(DataInputStream.java:572)
    at java.io.DataInputStream.readUTF(DataInputStream.java:547)
    at
    org.apache.hadoop.hbase.regionserver.HStoreFile$Reference.readFields(HSt
    oreFile.java:573)
    at
    org.apache.hadoop.hbase.regionserver.HStoreFile.readSplitInfo(HStoreFile
    .java:292)
    at
    org.apache.hadoop.hbase.regionserver.HStore.loadHStoreFiles(HStore.java:
    388)
    at
    org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:218)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.j
    ava:1653)
    at
    org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:470)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRe
    gionServer.java:902)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionSer
    ver.java:877)
    at
    org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionSer
    ver.java:817)
    at java.lang.Thread.run(Thread.java:619)
    from 192.168.1.202:60020

    And then it threw "Connection refused" while scanning ROOT region

    2008-09-10 07:06:55,706 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: /192.168.1.202:60020. Already tried 10 time(s).
    2008-09-10 07:06:56,707 INFO org.apache.hadoop.ipc.Client: Retrying
    connect to server: yiyi03/192.168.1.203:60020. Already tried 1
    time(s).
    2008-09-10 07:06:56,739 WARN
    org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
    java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
    sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
    at
    org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
    at org.apache.hadoop.ipc.Client.call(Client.java:546)
    at
    org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:230)
    at $Proxy2.openScanner(Unknown Source)
    at
    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:1
    59)
    at
    org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:48)
    at
    org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.j
    ava:74)
    at
    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
    at org.apache.hadoop.hbase.Chore.run(Chore.java:63)

    What could be a cause? Is it a well known issue? Some
    misconfiguration?

    Krzysiek



Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedSep 10, '08 at 10:16a
activeSep 10, '08 at 9:01p
posts5
users4
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase