The default - HBaseStorage() takes hell lot of time for puts.
In a cluster of 5 machines, insertion of 175 Million records took 4Hours 45
minutes
Question - Is this good enough ?
each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's heap
has been configured to 8GB.
If the put speed is low, how can i improve them..?
I tried tweaking the TableOutputFormat by increasing the WriteBufferSize to
24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList
and putting it as a batch). After doing this, it started throwing
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41135remote=slave1/
172.21.208.176:60020]
Which i assume is because, the clients took too long to put.
The detailed log is as follows from one of the reduce job is as follows.
I've 'censored' some of the details. which i assume is Okay.! :P
2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2012-04-23 20:07:13,097 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
exists!
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=*****.*****
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_22
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=****************************
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=**********************
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=***************************
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd64
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.38-8-server
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=raj
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=*********
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=**********************:
2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=master:2181 sessionTimeout=180000
watcher=hconnection
2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server /172.21.208.180:2181
2012-04-23 20:07:13,823 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
this process is [email protected]
2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to master/172.21.208.180:2181, initiating session
2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server master/172.21.208.180:2181, sessionid =
0x136dfa124e90015, negotiated timeout = 180000
2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created
table instance for index
2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
exited with exit code 0
2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
[email protected]
2012-04-23 20:08:49,852 WARN
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Failed all from
region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
hostname=slave1, port=60020
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41135remote=slave1/
172.21.208.176:60020]
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
at
com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
at
com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.SocketTimeoutException: Call to slave1/
172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41135remote=slave1/
172.21.208.176:60020]
at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy7.multi(Unknown Source)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41135remote=slave1/
172.21.208.176:60020]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
2012-04-23 20:09:51,018 WARN
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Failed all from
region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
hostname=slave1, port=60020
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41150remote=slave1/
172.21.208.176:60020]
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
at
com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
at
com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.SocketTimeoutException: Call to slave1/
172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41150remote=slave1/
172.21.208.176:60020]
at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy7.multi(Unknown Source)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41150remote=slave1/
172.21.208.176:60020]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
--
Thanks and Regards,
Raj