Grokbase Groups HBase dev July 2009
FAQ
Hi,

Hbase/Hadoop Setup:
1. 3 regionservers
2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
3. Using an older hbase version from the trunk [ Version: 0.20.0-dev, r786695, Sat Jun 20 18:01:17 EDT 2009 ]
4. Using hadoop [ 0.20.0 ]

Test Data:
1. The input is a CSV file with a 1M rows and about 20 columns and 4 metrics.
2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

The task is to parse through the CSV file and for each metric m1 create an entry into the hbase table "txn_m1" with the columns as needed. Attached is an pdf [from an excel] which explains how a single row in the CSV is converted into hbase data in the mapper and reducer stage. Attached is the code as well.

For processing a 1M records, it is taking about 38 minutes. I am using HTable.incrementColumnValue() in the reduce pass to create the records in the hbase tables.

Is there anything I should be doing differently or inherently incorrect? I would like run this task in 1 minute.

Thanks for the help,
Irfan

Here is the output of the process. Let me know if I should attach any other log.

09/07/02 15:19:11 INFO mapred.JobClient: Running job: job_200906192236_5114
09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
09/07/02 15:23:02 INFO mapred.JobClient: Task Id : attempt_200906192236_5114_r_000002_0, Status : FAILED
attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching new record writer ...
attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to initialize the hbase configuration
09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
09/07/02 15:23:08 INFO mapred.JobClient: Task Id : attempt_200906192236_5114_r_000013_0, Status : FAILED
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching new record writer ...
attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to initialize the hbase configuration
09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
09/07/02 15:23:14 INFO mapred.JobClient: Task Id : attempt_200906192236_5114_r_000012_0, Status : FAILED
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching new record writer ...
attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to initialize the hbase configuration
09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
09/07/02 15:23:17 INFO mapred.JobClient: Task Id : attempt_200906192236_5114_r_000014_0, Status : FAILED
org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching new record writer ...
attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to initialize the hbase configuration
09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
09/07/02 15:57:14 INFO mapred.JobClient: Job complete: job_200906192236_5114
09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2768325646
09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle bytes=883803390
09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
09/07/02 15:57:14 INFO mapred.JobClient: Reduce input records=16985488

Search Discussions

  • Jonathan Gray at Jul 2, 2009 at 8:52 pm
    Are you having HDFS issues? Did a RegionServer go down at all? Not
    sure what's up with the root location issue.

    With only 3 nodes and 20 reducers you are pushing things pretty hard...

    The best way to investigate how to improve your performance is to look
    at the load characteristics of the nodes. I definitely recommend
    setting up Ganglia as a first step.

    Irfan Mohammed wrote:
    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev, r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4 metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an entry into the hbase table "txn_m1" with the columns as needed. Attached is an pdf [from an excel] which explains how a single row in the CSV is converted into hbase data in the mapper and reducer stage. Attached is the code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using HTable.incrementColumnValue() in the reduce pass to create the records in the hbase tables.

    Is there anything I should be doing differently or inherently incorrect? I would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job: job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id : attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id : attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id : attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id : attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete: job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input records=16985488
  • Stack at Jul 2, 2009 at 10:12 pm
    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.

    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed. Attached is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records in
    the hbase tables.

    Is there anything I should be doing differently or inherently incorrect? I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job: job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input records=16985488
  • Irfan Mohammed at Jul 3, 2009 at 1:00 pm
    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass. Reduced the number of Map instances to 10. It is still taking about 12 minutes to complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job: job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete: job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.

    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed. Attached is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records in
    the hbase tables.

    Is there anything I should be doing differently or inherently incorrect? I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job: job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input records=16985488
  • Stack at Jul 3, 2009 at 9:44 pm
    Those NoServerForRegionException are probably putting a stake through
    throughput especially when they are complaining root is unobtainable. Lets
    try and figure whats up here (Jon Gray has a good suggestion in this
    regard).

    On schema, how many columns do you think you'll have per family? The number
    of columns story has improved by a bunch in hbase 0.20.0. Should be able to
    do thousands if not more (per column family).

    St.Ack

    On Fri, Jul 3, 2009 at 6:00 AM, Irfan Mohammed wrote:

    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass. Reduced
    the number of Map instances to 10. It is still taking about 12 minutes to
    complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job:
    job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to
    initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to
    initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to
    initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to
    initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to
    initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to
    initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to
    initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete:
    job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.

    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed. Attached is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records in
    the hbase tables.

    Is there anything I should be doing differently or inherently incorrect? I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job:
    job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient:
    FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle
    bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input
    records=16985488
  • Irfan Mohammed at Jul 5, 2009 at 3:52 am
    my zookeeper quorum had just one server and after jon gray's suggestion added two more to the quorom and the task did not have any failures.

    but still took 10 minutes for it to finish in my 3 nodes cluster. i am trying to add more nodes to the cluster and see if i get a better performance.

    regarding the question of # of columns per family, we are looking at the most of 20 families and the # of columns per family varies from 100-10000. would that be a problem in hbase?

    Thanks,
    Irfan

    09/07/04 23:09:19 INFO mapred.JobClient: Running job: job_200906192236_24635
    09/07/04 23:09:20 INFO mapred.JobClient: map 0% reduce 0%
    09/07/04 23:09:37 INFO mapred.JobClient: map 1% reduce 0%
    09/07/04 23:09:43 INFO mapred.JobClient: map 2% reduce 0%
    09/07/04 23:09:46 INFO mapred.JobClient: map 3% reduce 0%
    09/07/04 23:09:52 INFO mapred.JobClient: map 4% reduce 0%
    09/07/04 23:09:55 INFO mapred.JobClient: map 5% reduce 0%
    09/07/04 23:10:01 INFO mapred.JobClient: map 6% reduce 0%
    09/07/04 23:10:13 INFO mapred.JobClient: map 7% reduce 0%
    09/07/04 23:10:19 INFO mapred.JobClient: map 8% reduce 0%
    09/07/04 23:10:25 INFO mapred.JobClient: map 9% reduce 0%
    09/07/04 23:10:28 INFO mapred.JobClient: map 10% reduce 0%
    09/07/04 23:10:34 INFO mapred.JobClient: map 11% reduce 0%
    09/07/04 23:10:40 INFO mapred.JobClient: map 12% reduce 0%
    09/07/04 23:10:46 INFO mapred.JobClient: map 13% reduce 0%
    09/07/04 23:10:52 INFO mapred.JobClient: map 14% reduce 0%
    09/07/04 23:10:58 INFO mapred.JobClient: map 15% reduce 0%
    09/07/04 23:11:04 INFO mapred.JobClient: map 16% reduce 0%
    09/07/04 23:11:07 INFO mapred.JobClient: map 17% reduce 0%
    09/07/04 23:11:16 INFO mapred.JobClient: map 18% reduce 0%
    09/07/04 23:11:22 INFO mapred.JobClient: map 19% reduce 0%
    09/07/04 23:11:28 INFO mapred.JobClient: map 20% reduce 0%
    09/07/04 23:11:34 INFO mapred.JobClient: map 21% reduce 0%
    09/07/04 23:11:40 INFO mapred.JobClient: map 22% reduce 0%
    09/07/04 23:11:43 INFO mapred.JobClient: map 23% reduce 0%
    09/07/04 23:11:49 INFO mapred.JobClient: map 24% reduce 0%
    09/07/04 23:11:56 INFO mapred.JobClient: map 25% reduce 0%
    09/07/04 23:12:02 INFO mapred.JobClient: map 26% reduce 0%
    09/07/04 23:12:05 INFO mapred.JobClient: map 27% reduce 0%
    09/07/04 23:12:08 INFO mapred.JobClient: map 28% reduce 0%
    09/07/04 23:12:14 INFO mapred.JobClient: map 29% reduce 0%
    09/07/04 23:12:17 INFO mapred.JobClient: map 30% reduce 0%
    09/07/04 23:12:26 INFO mapred.JobClient: map 31% reduce 0%
    09/07/04 23:12:32 INFO mapred.JobClient: map 32% reduce 0%
    09/07/04 23:12:38 INFO mapred.JobClient: map 33% reduce 0%
    09/07/04 23:12:44 INFO mapred.JobClient: map 34% reduce 0%
    09/07/04 23:12:50 INFO mapred.JobClient: map 35% reduce 0%
    09/07/04 23:12:56 INFO mapred.JobClient: map 36% reduce 0%
    09/07/04 23:13:02 INFO mapred.JobClient: map 37% reduce 0%
    09/07/04 23:13:11 INFO mapred.JobClient: map 38% reduce 0%
    09/07/04 23:13:17 INFO mapred.JobClient: map 39% reduce 0%
    09/07/04 23:13:23 INFO mapred.JobClient: map 40% reduce 0%
    09/07/04 23:13:29 INFO mapred.JobClient: map 41% reduce 0%
    09/07/04 23:13:35 INFO mapred.JobClient: map 42% reduce 0%
    09/07/04 23:13:41 INFO mapred.JobClient: map 43% reduce 0%
    09/07/04 23:13:47 INFO mapred.JobClient: map 44% reduce 0%
    09/07/04 23:13:53 INFO mapred.JobClient: map 45% reduce 0%
    09/07/04 23:13:59 INFO mapred.JobClient: map 46% reduce 0%
    09/07/04 23:14:02 INFO mapred.JobClient: map 47% reduce 0%
    09/07/04 23:14:08 INFO mapred.JobClient: map 48% reduce 0%
    09/07/04 23:14:14 INFO mapred.JobClient: map 49% reduce 0%
    09/07/04 23:14:17 INFO mapred.JobClient: map 50% reduce 0%
    09/07/04 23:14:23 INFO mapred.JobClient: map 51% reduce 0%
    09/07/04 23:14:29 INFO mapred.JobClient: map 52% reduce 0%
    09/07/04 23:14:34 INFO mapred.JobClient: map 53% reduce 0%
    09/07/04 23:14:40 INFO mapred.JobClient: map 54% reduce 0%
    09/07/04 23:14:43 INFO mapred.JobClient: map 55% reduce 0%
    09/07/04 23:14:49 INFO mapred.JobClient: map 56% reduce 0%
    09/07/04 23:14:55 INFO mapred.JobClient: map 57% reduce 0%
    09/07/04 23:15:01 INFO mapred.JobClient: map 58% reduce 0%
    09/07/04 23:15:07 INFO mapred.JobClient: map 59% reduce 0%
    09/07/04 23:15:13 INFO mapred.JobClient: map 60% reduce 0%
    09/07/04 23:15:19 INFO mapred.JobClient: map 61% reduce 0%
    09/07/04 23:15:25 INFO mapred.JobClient: map 62% reduce 0%
    09/07/04 23:15:31 INFO mapred.JobClient: map 63% reduce 0%
    09/07/04 23:15:40 INFO mapred.JobClient: map 64% reduce 0%
    09/07/04 23:15:46 INFO mapred.JobClient: map 65% reduce 0%
    09/07/04 23:15:52 INFO mapred.JobClient: map 66% reduce 0%
    09/07/04 23:15:58 INFO mapred.JobClient: map 67% reduce 0%
    09/07/04 23:16:07 INFO mapred.JobClient: map 68% reduce 0%
    09/07/04 23:16:13 INFO mapred.JobClient: map 69% reduce 0%
    09/07/04 23:16:16 INFO mapred.JobClient: map 70% reduce 0%
    09/07/04 23:16:22 INFO mapred.JobClient: map 71% reduce 0%
    09/07/04 23:16:28 INFO mapred.JobClient: map 72% reduce 0%
    09/07/04 23:16:34 INFO mapred.JobClient: map 73% reduce 0%
    09/07/04 23:16:41 INFO mapred.JobClient: map 74% reduce 0%
    09/07/04 23:16:44 INFO mapred.JobClient: map 75% reduce 0%
    09/07/04 23:16:50 INFO mapred.JobClient: map 76% reduce 0%
    09/07/04 23:16:56 INFO mapred.JobClient: map 77% reduce 0%
    09/07/04 23:16:59 INFO mapred.JobClient: map 78% reduce 0%
    09/07/04 23:17:05 INFO mapred.JobClient: map 79% reduce 0%
    09/07/04 23:17:11 INFO mapred.JobClient: map 80% reduce 0%
    09/07/04 23:17:17 INFO mapred.JobClient: map 81% reduce 0%
    09/07/04 23:17:20 INFO mapred.JobClient: map 82% reduce 0%
    09/07/04 23:17:26 INFO mapred.JobClient: map 83% reduce 0%
    09/07/04 23:17:32 INFO mapred.JobClient: map 84% reduce 0%
    09/07/04 23:17:38 INFO mapred.JobClient: map 85% reduce 0%
    09/07/04 23:17:47 INFO mapred.JobClient: map 86% reduce 0%
    09/07/04 23:17:53 INFO mapred.JobClient: map 87% reduce 0%
    09/07/04 23:17:59 INFO mapred.JobClient: map 88% reduce 0%
    09/07/04 23:18:05 INFO mapred.JobClient: map 89% reduce 0%
    09/07/04 23:18:11 INFO mapred.JobClient: map 90% reduce 0%
    09/07/04 23:18:17 INFO mapred.JobClient: map 91% reduce 0%
    09/07/04 23:18:26 INFO mapred.JobClient: map 92% reduce 0%
    09/07/04 23:18:32 INFO mapred.JobClient: map 93% reduce 0%
    09/07/04 23:18:38 INFO mapred.JobClient: map 94% reduce 0%
    09/07/04 23:18:44 INFO mapred.JobClient: map 95% reduce 0%
    09/07/04 23:18:50 INFO mapred.JobClient: map 96% reduce 0%
    09/07/04 23:18:56 INFO mapred.JobClient: map 97% reduce 0%
    09/07/04 23:19:02 INFO mapred.JobClient: map 98% reduce 0%
    09/07/04 23:19:08 INFO mapred.JobClient: map 99% reduce 0%
    09/07/04 23:19:20 INFO mapred.JobClient: map 100% reduce 0%
    09/07/04 23:19:24 INFO mapred.JobClient: Job complete: job_200906192236_24635
    09/07/04 23:19:24 INFO mapred.JobClient: Counters: 8
    09/07/04 23:19:24 INFO mapred.JobClient: Job Counters
    09/07/04 23:19:24 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/04 23:19:24 INFO mapred.JobClient: Launched map tasks=10
    09/07/04 23:19:24 INFO mapred.JobClient: Data-local map tasks=8
    09/07/04 23:19:24 INFO mapred.JobClient: FileSystemCounters
    09/07/04 23:19:24 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/04 23:19:24 INFO mapred.JobClient: Map-Reduce Framework
    09/07/04 23:19:24 INFO mapred.JobClient: Map input records=294786
    09/07/04 23:19:24 INFO mapred.JobClient: Spilled Records=0
    09/07/04 23:19:24 INFO mapred.JobClient: Map input bytes=57966580
    09/07/04 23:19:24 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Friday, July 3, 2009 5:43:45 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Those NoServerForRegionException are probably putting a stake through
    throughput especially when they are complaining root is unobtainable. Lets
    try and figure whats up here (Jon Gray has a good suggestion in this
    regard).

    On schema, how many columns do you think you'll have per family? The number
    of columns story has improved by a bunch in hbase 0.20.0. Should be able to
    do thousands if not more (per column family).

    St.Ack

    On Fri, Jul 3, 2009 at 6:00 AM, Irfan Mohammed wrote:

    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass. Reduced
    the number of Map instances to 10. It is still taking about 12 minutes to
    complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job:
    job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to
    initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to
    initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to
    initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to
    initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to
    initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to
    initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to
    initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete:
    job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.

    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed. Attached is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records in
    the hbase tables.

    Is there anything I should be doing differently or inherently incorrect? I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job:
    job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient:
    FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle
    bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input
    records=16985488
  • Stack at Jul 5, 2009 at 9:32 pm

    On Sat, Jul 4, 2009 at 8:51 PM, Irfan Mohammed wrote:

    my zookeeper quorum had just one server and after jon gray's suggestion
    added two more to the quorom and the task did not have any failures.
    That is good to know though I think that if a single zk instance is not able
    to handle loading of 3 nodes, I think there's something up with it. We'll
    take a look into it.


    but still took 10 minutes for it to finish in my 3 nodes cluster. i am
    trying to add more nodes to the cluster and see if i get a better
    performance.
    Yeah, this would be good to know.

    So you are doing all in the map now but still updating 4 tables on each
    update? (200k rows in become 7M rows out)? What do you see if you study the
    UI? Are the updates split evenly across all 3 servers or are they marching
    lockstep across the table's regions? (i.e. are updates spread across all
    servers or do we bang on one at a time?)


    regarding the question of # of columns per family, we are looking at the
    most of 20 families and the # of columns per family varies from 100-10000.
    would that be a problem in hbase?


    According to Jon Gray who tested how hbase does with many columns, only real
    issue will be memory; returning 10k columns on one row all in the one go,
    especially if they are of any significant size, could put pressure on
    server+client memory. Otherwise, it should work fine (There are
    optimizations we need to do to make it faster than it is, but its for sure
    way better than it was in 0.19.x).

    St.Ack

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Friday, July 3, 2009 5:43:45 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Those NoServerForRegionException are probably putting a stake through
    throughput especially when they are complaining root is unobtainable. Lets
    try and figure whats up here (Jon Gray has a good suggestion in this
    regard).

    On schema, how many columns do you think you'll have per family? The
    number
    of columns story has improved by a bunch in hbase 0.20.0. Should be able
    to
    do thousands if not more (per column family).

    St.Ack

    On Fri, Jul 3, 2009 at 6:00 AM, Irfan Mohammed wrote:

    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass. Reduced
    the number of Map instances to 10. It is still taking about 12 minutes to
    complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job:
    job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to
    initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to
    initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to
    initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to
    initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to
    initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to
    initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to
    initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete:
    job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.
    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed.
    Attached
    is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records
    in
    the hbase tables.

    Is there anything I should be doing differently or inherently
    incorrect?
    I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job:
    job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed
    to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed
    to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed
    to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed
    to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient:
    FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle
    bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output
    records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input
    records=16985488
  • Irfan Mohammed at Jul 6, 2009 at 3:10 pm
    I added 2 more regionservers and now have 5 regionservers but the inserts times are pretty constant around 10-12 minutes. As far I can see the tasks are distributed across the 5 regionservers and all [ 10 map tasks ] of them start at the same time and complete in ~ 12 minutes.

    How and Where can I check whether the update splits are happening and which ones are taking long time?

    I checked with a single table and four tables and the results are pretty consistent of about 12 minutes.

    Thanks.

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Sunday, July 5, 2009 5:31:45 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Sat, Jul 4, 2009 at 8:51 PM, Irfan Mohammed wrote:

    my zookeeper quorum had just one server and after jon gray's suggestion
    added two more to the quorom and the task did not have any failures.
    That is good to know though I think that if a single zk instance is not able
    to handle loading of 3 nodes, I think there's something up with it. We'll
    take a look into it.


    but still took 10 minutes for it to finish in my 3 nodes cluster. i am
    trying to add more nodes to the cluster and see if i get a better
    performance.
    Yeah, this would be good to know.

    So you are doing all in the map now but still updating 4 tables on each
    update? (200k rows in become 7M rows out)? What do you see if you study the
    UI? Are the updates split evenly across all 3 servers or are they marching
    lockstep across the table's regions? (i.e. are updates spread across all
    servers or do we bang on one at a time?)


    regarding the question of # of columns per family, we are looking at the
    most of 20 families and the # of columns per family varies from 100-10000.
    would that be a problem in hbase?


    According to Jon Gray who tested how hbase does with many columns, only real
    issue will be memory; returning 10k columns on one row all in the one go,
    especially if they are of any significant size, could put pressure on
    server+client memory. Otherwise, it should work fine (There are
    optimizations we need to do to make it faster than it is, but its for sure
    way better than it was in 0.19.x).

    St.Ack

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Friday, July 3, 2009 5:43:45 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Those NoServerForRegionException are probably putting a stake through
    throughput especially when they are complaining root is unobtainable. Lets
    try and figure whats up here (Jon Gray has a good suggestion in this
    regard).

    On schema, how many columns do you think you'll have per family? The
    number
    of columns story has improved by a bunch in hbase 0.20.0. Should be able
    to
    do thousands if not more (per column family).

    St.Ack

    On Fri, Jul 3, 2009 at 6:00 AM, Irfan Mohammed wrote:

    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass. Reduced
    the number of Map instances to 10. It is still taking about 12 minutes to
    complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job:
    job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to
    initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to
    initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to
    initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to
    initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to
    initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to
    initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to
    initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete:
    job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.
    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed.
    Attached
    is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records
    in
    the hbase tables.

    Is there anything I should be doing differently or inherently
    incorrect?
    I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job:
    job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed
    to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed
    to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed
    to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed
    to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient:
    FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle
    bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output
    records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input
    records=16985488
  • Stack at Jul 6, 2009 at 3:34 pm
    So, no difference in overall elapsed time nearly doubling the number of
    servers writing 7M updates? Updating 4 tables takes same time as updating
    one table? Have you tried writing to files in HDFS to see if time is any
    faster to verify that hbase is whats holding up your job?

    So, you have 10 maps to complete. How many concurrent mappers do you have
    running? 2 per node?

    Regards whether splits are happening, are number of regions going up as the
    job runs? (You can see in the UI).

    Are you batching your updates?
    http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean)

    You could try setting Put#writeToWAL to false to see what difference that
    makes in your upload.

    St.Ack

    On Mon, Jul 6, 2009 at 8:09 AM, Irfan Mohammed wrote:

    I added 2 more regionservers and now have 5 regionservers but the inserts
    times are pretty constant around 10-12 minutes. As far I can see the tasks
    are distributed across the 5 regionservers and all [ 10 map tasks ] of them
    start at the same time and complete in ~ 12 minutes.

    How and Where can I check whether the update splits are happening and which
    ones are taking long time?

    I checked with a single table and four tables and the results are pretty
    consistent of about 12 minutes.

    Thanks.

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Sunday, July 5, 2009 5:31:45 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Sat, Jul 4, 2009 at 8:51 PM, Irfan Mohammed wrote:

    my zookeeper quorum had just one server and after jon gray's suggestion
    added two more to the quorom and the task did not have any failures.
    That is good to know though I think that if a single zk instance is not
    able
    to handle loading of 3 nodes, I think there's something up with it. We'll
    take a look into it.


    but still took 10 minutes for it to finish in my 3 nodes cluster. i am
    trying to add more nodes to the cluster and see if i get a better
    performance.
    Yeah, this would be good to know.

    So you are doing all in the map now but still updating 4 tables on each
    update? (200k rows in become 7M rows out)? What do you see if you study
    the
    UI? Are the updates split evenly across all 3 servers or are they marching
    lockstep across the table's regions? (i.e. are updates spread across all
    servers or do we bang on one at a time?)


    regarding the question of # of columns per family, we are looking at the
    most of 20 families and the # of columns per family varies from
    100-10000.
    would that be a problem in hbase?


    According to Jon Gray who tested how hbase does with many columns, only
    real
    issue will be memory; returning 10k columns on one row all in the one go,
    especially if they are of any significant size, could put pressure on
    server+client memory. Otherwise, it should work fine (There are
    optimizations we need to do to make it faster than it is, but its for sure
    way better than it was in 0.19.x).

    St.Ack

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Friday, July 3, 2009 5:43:45 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Those NoServerForRegionException are probably putting a stake through
    throughput especially when they are complaining root is unobtainable. Lets
    try and figure whats up here (Jon Gray has a good suggestion in this
    regard).

    On schema, how many columns do you think you'll have per family? The
    number
    of columns story has improved by a bunch in hbase 0.20.0. Should be able
    to
    do thousands if not more (per column family).

    St.Ack

    On Fri, Jul 3, 2009 at 6:00 AM, Irfan Mohammed wrote:

    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass.
    Reduced
    the number of Map instances to 10. It is still taking about 12 minutes
    to
    complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job:
    job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to
    initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to
    initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to
    initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to
    initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to
    initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to
    initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to
    initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete:
    job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.
    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs.
    Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures
    are
    trying to find root region. Root region is in zk. Odd it can't be
    found
    there.

    The fetching of map data and sort is taking a considerable amount of
    the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs
    with
    no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack





    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed <irfan.ma@gmail.com>
    wrote:
    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1
    create
    an
    entry into the hbase table "txn_m1" with the columns as needed.
    Attached
    is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached
    is
    the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am
    using
    HTable.incrementColumnValue() in the reduce pass to create the
    records
    in
    the hbase tables.

    Is there anything I should be doing differently or inherently
    incorrect?
    I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job:
    job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429]
    failed
    to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369]
    failed
    to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185]
    failed
    to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285]
    failed
    to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient:
    FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient:
    FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle
    bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output
    bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output
    records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input
    records=16985488
  • Irfan Mohammed at Jul 3, 2009 at 2:40 pm
    Thanks St. Ack.

    Attached is the excel I used to generate the pdf. I am looking for some help in designing the schema for scaling my storage/inserts/query. I thought that breaking up the csv row with all the metrics into individual tables is more scalable for inserts/query but not for storage.

    If I change it to a single table, as detailed in the attached excel, would it not cause columns explosion. Is that ok in hbase?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.

    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed. Attached is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records in
    the hbase tables.

    Is there anything I should be doing differently or inherently incorrect? I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job: job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input records=16985488
  • Irfan Mohammed at Jul 3, 2009 at 4:15 pm
    Here is the syslog.txt. Any clues why I keep getting the NoServerForRegionException?

    Thanks,
    Irfan

    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Friday, July 3, 2009 9:00:29 AM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass. Reduced the number of Map instances to 10. It is still taking about 12 minutes to complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job: job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete: job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.

    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed. Attached is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records in
    the hbase tables.

    Is there anything I should be doing differently or inherently incorrect? I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job: job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input records=16985488
  • Jonathan Gray at Jul 3, 2009 at 5:02 pm
    That syslog and the fact that you have issues with locating root points
    to a problem with your zookeeper instance.

    What is your zookeeper setup? Under heavy load it's important to have
    at least a 3 node quorom.

    Irfan Mohammed wrote:
    Here is the syslog.txt. Any clues why I keep getting the NoServerForRegionException?

    Thanks,
    Irfan

    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Friday, July 3, 2009 9:00:29 AM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass. Reduced the number of Map instances to 10. It is still taking about 12 minutes to complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job: job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id : attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete: job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.

    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs. Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures are
    trying to find root region. Root region is in zk. Odd it can't be found
    there.

    The fetching of map data and sort is taking a considerable amount of the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs with no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack




    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed wrote:

    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1 create an
    entry into the hbase table "txn_m1" with the columns as needed. Attached is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached is the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am using
    HTable.incrementColumnValue() in the reduce pass to create the records in
    the hbase tables.

    Is there anything I should be doing differently or inherently incorrect? I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job: job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230] fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429] failed to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183] fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369] failed to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434] fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185] failed to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442] fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285] failed to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input records=16985488
  • Irfan Mohammed at Jul 6, 2009 at 6:06 pm
    I am working on writing to HDFS files. Will update you by end of day today.

    There are always 10 concurrent mappers running. I keep setting the setNumMaps(5) and also the following properties in mapred-site.xml to 3 but still end up running 10 concurrent maps.

    <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>3</value>
    <description>The maximum number of map tasks that will be run
    simultaneously by a task tracker.
    </description>
    </property>

    <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>3</value>
    <description>The maximum number of reduce tasks that will be run
    simultaneously by a task tracker.
    </description>
    </property>

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    I have setAutoFlush(false) and also writeToWal(false) with the same behaviour.

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 11:34:07 AM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    So, no difference in overall elapsed time nearly doubling the number of
    servers writing 7M updates? Updating 4 tables takes same time as updating
    one table? Have you tried writing to files in HDFS to see if time is any
    faster to verify that hbase is whats holding up your job?

    So, you have 10 maps to complete. How many concurrent mappers do you have
    running? 2 per node?

    Regards whether splits are happening, are number of regions going up as the
    job runs? (You can see in the UI).

    Are you batching your updates?
    http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean)

    You could try setting Put#writeToWAL to false to see what difference that
    makes in your upload.

    St.Ack

    On Mon, Jul 6, 2009 at 8:09 AM, Irfan Mohammed wrote:

    I added 2 more regionservers and now have 5 regionservers but the inserts
    times are pretty constant around 10-12 minutes. As far I can see the tasks
    are distributed across the 5 regionservers and all [ 10 map tasks ] of them
    start at the same time and complete in ~ 12 minutes.

    How and Where can I check whether the update splits are happening and which
    ones are taking long time?

    I checked with a single table and four tables and the results are pretty
    consistent of about 12 minutes.

    Thanks.

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Sunday, July 5, 2009 5:31:45 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Sat, Jul 4, 2009 at 8:51 PM, Irfan Mohammed wrote:

    my zookeeper quorum had just one server and after jon gray's suggestion
    added two more to the quorom and the task did not have any failures.
    That is good to know though I think that if a single zk instance is not
    able
    to handle loading of 3 nodes, I think there's something up with it. We'll
    take a look into it.


    but still took 10 minutes for it to finish in my 3 nodes cluster. i am
    trying to add more nodes to the cluster and see if i get a better
    performance.
    Yeah, this would be good to know.

    So you are doing all in the map now but still updating 4 tables on each
    update? (200k rows in become 7M rows out)? What do you see if you study
    the
    UI? Are the updates split evenly across all 3 servers or are they marching
    lockstep across the table's regions? (i.e. are updates spread across all
    servers or do we bang on one at a time?)


    regarding the question of # of columns per family, we are looking at the
    most of 20 families and the # of columns per family varies from
    100-10000.
    would that be a problem in hbase?


    According to Jon Gray who tested how hbase does with many columns, only
    real
    issue will be memory; returning 10k columns on one row all in the one go,
    especially if they are of any significant size, could put pressure on
    server+client memory. Otherwise, it should work fine (There are
    optimizations we need to do to make it faster than it is, but its for sure
    way better than it was in 0.19.x).

    St.Ack

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Friday, July 3, 2009 5:43:45 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Those NoServerForRegionException are probably putting a stake through
    throughput especially when they are complaining root is unobtainable. Lets
    try and figure whats up here (Jon Gray has a good suggestion in this
    regard).

    On schema, how many columns do you think you'll have per family? The
    number
    of columns story has improved by a bunch in hbase 0.20.0. Should be able
    to
    do thousands if not more (per column family).

    St.Ack

    On Fri, Jul 3, 2009 at 6:00 AM, Irfan Mohammed wrote:

    Thanks for the quick responses.

    I removed the reduce pass and doing the inserts in the map pass.
    Reduced
    the number of Map instances to 10. It is still taking about 12 minutes
    to
    complete the inserts.

    Any reason why there should be arbitrary NoServerForRegionException?

    I am working on writing to hdfs and checking the performance.

    09/07/03 08:38:35 INFO mapred.JobClient: Running job:
    job_200906192236_24166
    09/07/03 08:38:36 INFO mapred.JobClient: map 0% reduce 0%
    09/07/03 08:38:53 INFO mapred.JobClient: map 1% reduce 0%
    09/07/03 08:38:59 INFO mapred.JobClient: map 2% reduce 0%
    09/07/03 08:39:02 INFO mapred.JobClient: map 3% reduce 0%
    09/07/03 08:39:08 INFO mapred.JobClient: map 4% reduce 0%
    09/07/03 08:39:14 INFO mapred.JobClient: map 5% reduce 0%
    09/07/03 08:39:20 INFO mapred.JobClient: map 6% reduce 0%
    09/07/03 08:39:26 INFO mapred.JobClient: map 7% reduce 0%
    09/07/03 08:39:35 INFO mapred.JobClient: map 8% reduce 0%
    09/07/03 08:39:41 INFO mapred.JobClient: map 9% reduce 0%
    09/07/03 08:39:50 INFO mapred.JobClient: map 10% reduce 0%
    09/07/03 08:39:56 INFO mapred.JobClient: map 11% reduce 0%
    09/07/03 08:40:05 INFO mapred.JobClient: map 12% reduce 0%
    09/07/03 08:40:14 INFO mapred.JobClient: map 13% reduce 0%
    09/07/03 08:40:20 INFO mapred.JobClient: map 14% reduce 0%
    09/07/03 08:40:26 INFO mapred.JobClient: map 15% reduce 0%
    09/07/03 08:40:32 INFO mapred.JobClient: map 16% reduce 0%
    09/07/03 08:40:38 INFO mapred.JobClient: map 17% reduce 0%
    09/07/03 08:40:44 INFO mapred.JobClient: map 18% reduce 0%
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_0: [2009-07-03 08:40:42.553] failed to
    initialize the hbase configuration
    09/07/03 08:40:46 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_0: [2009-07-03 08:40:40.061] failed to
    initialize the hbase configuration
    09/07/03 08:40:47 INFO mapred.JobClient: map 19% reduce 0%
    09/07/03 08:40:49 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_0: [2009-07-03 08:40:44.631] failed to
    initialize the hbase configuration
    09/07/03 08:40:53 INFO mapred.JobClient: map 20% reduce 0%
    09/07/03 08:40:56 INFO mapred.JobClient: map 21% reduce 0%
    09/07/03 08:41:02 INFO mapred.JobClient: map 22% reduce 0%
    09/07/03 08:41:08 INFO mapred.JobClient: map 23% reduce 0%
    09/07/03 08:41:17 INFO mapred.JobClient: map 24% reduce 0%
    09/07/03 08:41:26 INFO mapred.JobClient: map 25% reduce 0%
    09/07/03 08:41:32 INFO mapred.JobClient: map 26% reduce 0%
    09/07/03 08:41:38 INFO mapred.JobClient: map 27% reduce 0%
    09/07/03 08:41:44 INFO mapred.JobClient: map 28% reduce 0%
    09/07/03 08:41:50 INFO mapred.JobClient: map 29% reduce 0%
    09/07/03 08:41:53 INFO mapred.JobClient: map 30% reduce 0%
    09/07/03 08:42:02 INFO mapred.JobClient: map 31% reduce 0%
    09/07/03 08:42:08 INFO mapred.JobClient: map 32% reduce 0%
    09/07/03 08:42:11 INFO mapred.JobClient: map 33% reduce 0%
    09/07/03 08:42:17 INFO mapred.JobClient: map 34% reduce 0%
    09/07/03 08:42:20 INFO mapred.JobClient: map 35% reduce 0%
    09/07/03 08:42:26 INFO mapred.JobClient: map 36% reduce 0%
    09/07/03 08:42:32 INFO mapred.JobClient: map 37% reduce 0%
    09/07/03 08:42:38 INFO mapred.JobClient: map 38% reduce 0%
    09/07/03 08:42:44 INFO mapred.JobClient: map 39% reduce 0%
    09/07/03 08:42:53 INFO mapred.JobClient: map 40% reduce 0%
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_1: [2009-07-03 08:42:50.373] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_1: [2009-07-03 08:42:49.181] failed to
    initialize the hbase configuration
    09/07/03 08:42:55 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_1, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_1: [2009-07-03 08:42:49.498] failed to
    initialize the hbase configuration
    09/07/03 08:42:59 INFO mapred.JobClient: map 41% reduce 0%
    09/07/03 08:43:08 INFO mapred.JobClient: map 42% reduce 0%
    09/07/03 08:43:14 INFO mapred.JobClient: map 43% reduce 0%
    09/07/03 08:43:23 INFO mapred.JobClient: map 44% reduce 0%
    09/07/03 08:43:32 INFO mapred.JobClient: map 45% reduce 0%
    09/07/03 08:43:41 INFO mapred.JobClient: map 46% reduce 0%
    09/07/03 08:43:50 INFO mapred.JobClient: map 47% reduce 0%
    09/07/03 08:43:56 INFO mapred.JobClient: map 48% reduce 0%
    09/07/03 08:44:02 INFO mapred.JobClient: map 49% reduce 0%
    09/07/03 08:44:08 INFO mapred.JobClient: map 50% reduce 0%
    09/07/03 08:44:14 INFO mapred.JobClient: map 51% reduce 0%
    09/07/03 08:44:20 INFO mapred.JobClient: map 52% reduce 0%
    09/07/03 08:44:23 INFO mapred.JobClient: map 53% reduce 0%
    09/07/03 08:44:29 INFO mapred.JobClient: map 54% reduce 0%
    09/07/03 08:44:35 INFO mapred.JobClient: map 55% reduce 0%
    09/07/03 08:44:38 INFO mapred.JobClient: map 56% reduce 0%
    09/07/03 08:44:47 INFO mapred.JobClient: map 57% reduce 0%
    09/07/03 08:44:53 INFO mapred.JobClient: map 58% reduce 0%
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000007_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000007_2: [2009-07-03 08:44:55.897] failed to
    initialize the hbase configuration
    09/07/03 08:45:01 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000009_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000009_2: [2009-07-03 08:44:56.296] failed to
    initialize the hbase configuration
    09/07/03 08:45:02 INFO mapred.JobClient: map 59% reduce 0%
    09/07/03 08:45:04 INFO mapred.JobClient: Task Id :
    attempt_200906192236_24166_m_000008_2, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:449)
    at
    org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:558)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_24166_m_000008_2: [2009-07-03 08:44:59.221] failed to
    initialize the hbase configuration
    09/07/03 08:45:08 INFO mapred.JobClient: map 60% reduce 0%
    09/07/03 08:45:17 INFO mapred.JobClient: map 61% reduce 0%
    09/07/03 08:45:26 INFO mapred.JobClient: map 62% reduce 0%
    09/07/03 08:45:32 INFO mapred.JobClient: map 63% reduce 0%
    09/07/03 08:45:38 INFO mapred.JobClient: map 64% reduce 0%
    09/07/03 08:45:44 INFO mapred.JobClient: map 65% reduce 0%
    09/07/03 08:45:50 INFO mapred.JobClient: map 66% reduce 0%
    09/07/03 08:45:56 INFO mapred.JobClient: map 67% reduce 0%
    09/07/03 08:46:02 INFO mapred.JobClient: map 68% reduce 0%
    09/07/03 08:46:08 INFO mapred.JobClient: map 69% reduce 0%
    09/07/03 08:46:15 INFO mapred.JobClient: map 70% reduce 0%
    09/07/03 08:46:21 INFO mapred.JobClient: map 71% reduce 0%
    09/07/03 08:46:27 INFO mapred.JobClient: map 72% reduce 0%
    09/07/03 08:46:36 INFO mapred.JobClient: map 73% reduce 0%
    09/07/03 08:46:45 INFO mapred.JobClient: map 74% reduce 0%
    09/07/03 08:46:54 INFO mapred.JobClient: map 75% reduce 0%
    09/07/03 08:47:03 INFO mapred.JobClient: map 76% reduce 0%
    09/07/03 08:47:12 INFO mapred.JobClient: map 77% reduce 0%
    09/07/03 08:47:18 INFO mapred.JobClient: map 78% reduce 0%
    09/07/03 08:47:24 INFO mapred.JobClient: map 79% reduce 0%
    09/07/03 08:47:33 INFO mapred.JobClient: map 80% reduce 0%
    09/07/03 08:47:42 INFO mapred.JobClient: map 81% reduce 0%
    09/07/03 08:47:51 INFO mapred.JobClient: map 82% reduce 0%
    09/07/03 08:48:00 INFO mapred.JobClient: map 83% reduce 0%
    09/07/03 08:48:09 INFO mapred.JobClient: map 84% reduce 0%
    09/07/03 08:48:15 INFO mapred.JobClient: map 85% reduce 0%
    09/07/03 08:48:24 INFO mapred.JobClient: map 86% reduce 0%
    09/07/03 08:48:30 INFO mapred.JobClient: map 87% reduce 0%
    09/07/03 08:48:39 INFO mapred.JobClient: map 88% reduce 0%
    09/07/03 08:48:54 INFO mapred.JobClient: map 89% reduce 0%
    09/07/03 08:49:06 INFO mapred.JobClient: map 90% reduce 0%
    09/07/03 08:49:15 INFO mapred.JobClient: map 91% reduce 0%
    09/07/03 08:49:24 INFO mapred.JobClient: map 92% reduce 0%
    09/07/03 08:49:30 INFO mapred.JobClient: map 93% reduce 0%
    09/07/03 08:49:36 INFO mapred.JobClient: map 94% reduce 0%
    09/07/03 08:49:45 INFO mapred.JobClient: map 95% reduce 0%
    09/07/03 08:49:57 INFO mapred.JobClient: map 96% reduce 0%
    09/07/03 08:50:08 INFO mapred.JobClient: map 97% reduce 0%
    09/07/03 08:50:17 INFO mapred.JobClient: map 98% reduce 0%
    09/07/03 08:50:26 INFO mapred.JobClient: map 99% reduce 0%
    09/07/03 08:50:35 INFO mapred.JobClient: map 100% reduce 0%
    09/07/03 08:50:40 INFO mapred.JobClient: Job complete:
    job_200906192236_24166
    09/07/03 08:50:40 INFO mapred.JobClient: Counters: 7
    09/07/03 08:50:40 INFO mapred.JobClient: Job Counters
    09/07/03 08:50:40 INFO mapred.JobClient: Launched map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: Data-local map tasks=19
    09/07/03 08:50:40 INFO mapred.JobClient: FileSystemCounters
    09/07/03 08:50:40 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map-Reduce Framework
    09/07/03 08:50:40 INFO mapred.JobClient: Map input records=294786
    09/07/03 08:50:40 INFO mapred.JobClient: Spilled Records=0
    09/07/03 08:50:40 INFO mapred.JobClient: Map input bytes=57966580
    09/07/03 08:50:40 INFO mapred.JobClient: Map output records=0


    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Thursday, July 2, 2009 6:12:29 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Why 4 tables? Why not one table and four column families, one for each
    metric? (Looking in excel spreadsheet, each row has same key). Then you'd
    be doing one insert against a single table rather than four separate ones.
    Looking at your MR output below, it looks like it takes 40 seconds to
    complete the map tasks. The report says that there 294786 inputs.
    Says
    that the mapper outputs 17M records. Is that expected?

    A few of your reducers failed and were done over again. The redos were
    probably significant part of the overall elapsed time. The failures
    are
    trying to find root region. Root region is in zk. Odd it can't be
    found
    there.

    The fetching of map data and sort is taking a considerable amount of
    the
    overall time. Do you need to reduce step (Couldn't tell from the excel
    spreadsheet -- there didn't seem to be any summing going on). If not, this
    could make for savings too.

    You might try outputting to hdfs first to see how fast the job runs
    with
    no
    hbase involved. See how long that takes. Tune this part of the job first.
    Then add in hbase and see how much it slows things.

    Looking at your code, nothing obviously onerous.

    St.Ack





    On Thu, Jul 2, 2009 at 1:22 PM, Irfan Mohammed <irfan.ma@gmail.com>
    wrote:
    Hi,

    Hbase/Hadoop Setup:
    1. 3 regionservers
    2. Run the task using 20 Map Tasks and 20 Reduce Tasks.
    3. Using an older hbase version from the trunk [ Version: 0.20.0-dev,
    r786695, Sat Jun 20 18:01:17 EDT 2009 ]
    4. Using hadoop [ 0.20.0 ]

    Test Data:
    1. The input is a CSV file with a 1M rows and about 20 columns and 4
    metrics.
    2. Output is 4 hbase tables "txn_m1", "txn_m2", "txn_m3", "txn_m4".

    The task is to parse through the CSV file and for each metric m1
    create
    an
    entry into the hbase table "txn_m1" with the columns as needed.
    Attached
    is
    an pdf [from an excel] which explains how a single row in the CSV is
    converted into hbase data in the mapper and reducer stage. Attached
    is
    the
    code as well.

    For processing a 1M records, it is taking about 38 minutes. I am
    using
    HTable.incrementColumnValue() in the reduce pass to create the
    records
    in
    the hbase tables.

    Is there anything I should be doing differently or inherently
    incorrect?
    I
    would like run this task in 1 minute.

    Thanks for the help,
    Irfan

    Here is the output of the process. Let me know if I should attach any other
    log.

    09/07/02 15:19:11 INFO mapred.JobClient: Running job:
    job_200906192236_5114
    09/07/02 15:19:12 INFO mapred.JobClient: map 0% reduce 0%
    09/07/02 15:19:29 INFO mapred.JobClient: map 30% reduce 0%
    09/07/02 15:19:32 INFO mapred.JobClient: map 46% reduce 0%
    09/07/02 15:19:35 INFO mapred.JobClient: map 64% reduce 0%
    09/07/02 15:19:38 INFO mapred.JobClient: map 75% reduce 0%
    09/07/02 15:19:44 INFO mapred.JobClient: map 76% reduce 0%
    09/07/02 15:19:47 INFO mapred.JobClient: map 99% reduce 1%
    09/07/02 15:19:50 INFO mapred.JobClient: map 100% reduce 3%
    09/07/02 15:19:53 INFO mapred.JobClient: map 100% reduce 4%
    09/07/02 15:19:56 INFO mapred.JobClient: map 100% reduce 10%
    09/07/02 15:19:59 INFO mapred.JobClient: map 100% reduce 12%
    09/07/02 15:20:02 INFO mapred.JobClient: map 100% reduce 16%
    09/07/02 15:20:05 INFO mapred.JobClient: map 100% reduce 25%
    09/07/02 15:20:08 INFO mapred.JobClient: map 100% reduce 33%
    09/07/02 15:20:11 INFO mapred.JobClient: map 100% reduce 36%
    09/07/02 15:20:14 INFO mapred.JobClient: map 100% reduce 39%
    09/07/02 15:20:17 INFO mapred.JobClient: map 100% reduce 41%
    09/07/02 15:20:29 INFO mapred.JobClient: map 100% reduce 42%
    09/07/02 15:20:32 INFO mapred.JobClient: map 100% reduce 44%
    09/07/02 15:20:38 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:20:49 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:20:55 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:21:01 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:21:34 INFO mapred.JobClient: map 100% reduce 52%
    09/07/02 15:21:39 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:22:06 INFO mapred.JobClient: map 100% reduce 54%
    09/07/02 15:22:28 INFO mapred.JobClient: map 100% reduce 55%
    09/07/02 15:22:44 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:02 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000002_0, Status : FAILED
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:20:27.230]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000002_0: [2009-07-02 15:22:51.429]
    failed
    to
    initialize the hbase configuration
    09/07/02 15:23:08 INFO mapred.JobClient: map 100% reduce 53%
    09/07/02 15:23:08 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000013_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:20:33.183]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000013_0: [2009-07-02 15:23:04.369]
    failed
    to
    initialize the hbase configuration
    09/07/02 15:23:09 INFO mapred.JobClient: map 100% reduce 50%
    09/07/02 15:23:14 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000012_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:20:48.434]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000012_0: [2009-07-02 15:23:10.185]
    failed
    to
    initialize the hbase configuration
    09/07/02 15:23:15 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:17 INFO mapred.JobClient: Task Id :
    attempt_200906192236_5114_r_000014_0, Status : FAILED
    org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying
    to locate root region
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:863)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:514)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:523)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:496)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:628)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:527)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:490)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:124)
    at
    org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:107)
    at
    com.qwapi.txnload.LoadMultipleCubes$CubeOutputFormat.getRecordWriter(LoadMultipleCubes.java:442)
    at
    org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:435)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:20:47.442]
    fetching
    new record writer ...
    attempt_200906192236_5114_r_000014_0: [2009-07-02 15:23:13.285]
    failed
    to
    initialize the hbase configuration
    09/07/02 15:23:18 INFO mapred.JobClient: map 100% reduce 45%
    09/07/02 15:23:21 INFO mapred.JobClient: map 100% reduce 46%
    09/07/02 15:23:29 INFO mapred.JobClient: map 100% reduce 47%
    09/07/02 15:23:32 INFO mapred.JobClient: map 100% reduce 48%
    09/07/02 15:23:36 INFO mapred.JobClient: map 100% reduce 49%
    09/07/02 15:23:39 INFO mapred.JobClient: map 100% reduce 51%
    09/07/02 15:23:42 INFO mapred.JobClient: map 100% reduce 56%
    09/07/02 15:23:45 INFO mapred.JobClient: map 100% reduce 58%
    09/07/02 15:24:20 INFO mapred.JobClient: map 100% reduce 59%
    09/07/02 15:25:11 INFO mapred.JobClient: map 100% reduce 60%
    09/07/02 15:25:17 INFO mapred.JobClient: map 100% reduce 61%
    09/07/02 15:25:26 INFO mapred.JobClient: map 100% reduce 62%
    09/07/02 15:25:32 INFO mapred.JobClient: map 100% reduce 64%
    09/07/02 15:25:38 INFO mapred.JobClient: map 100% reduce 65%
    09/07/02 15:26:20 INFO mapred.JobClient: map 100% reduce 66%
    09/07/02 15:26:40 INFO mapred.JobClient: map 100% reduce 67%
    09/07/02 15:26:48 INFO mapred.JobClient: map 100% reduce 68%
    09/07/02 15:27:16 INFO mapred.JobClient: map 100% reduce 69%
    09/07/02 15:27:21 INFO mapred.JobClient: map 100% reduce 70%
    09/07/02 15:27:46 INFO mapred.JobClient: map 100% reduce 71%
    09/07/02 15:28:25 INFO mapred.JobClient: map 100% reduce 72%
    09/07/02 15:28:46 INFO mapred.JobClient: map 100% reduce 73%
    09/07/02 15:29:08 INFO mapred.JobClient: map 100% reduce 74%
    09/07/02 15:29:45 INFO mapred.JobClient: map 100% reduce 76%
    09/07/02 15:30:42 INFO mapred.JobClient: map 100% reduce 77%
    09/07/02 15:31:06 INFO mapred.JobClient: map 100% reduce 78%
    09/07/02 15:31:12 INFO mapred.JobClient: map 100% reduce 79%
    09/07/02 15:31:36 INFO mapred.JobClient: map 100% reduce 81%
    09/07/02 15:31:37 INFO mapred.JobClient: map 100% reduce 82%
    09/07/02 15:32:00 INFO mapred.JobClient: map 100% reduce 83%
    09/07/02 15:32:09 INFO mapred.JobClient: map 100% reduce 84%
    09/07/02 15:32:30 INFO mapred.JobClient: map 100% reduce 86%
    09/07/02 15:38:42 INFO mapred.JobClient: map 100% reduce 88%
    09/07/02 15:39:49 INFO mapred.JobClient: map 100% reduce 89%
    09/07/02 15:41:13 INFO mapred.JobClient: map 100% reduce 90%
    09/07/02 15:41:16 INFO mapred.JobClient: map 100% reduce 91%
    09/07/02 15:41:28 INFO mapred.JobClient: map 100% reduce 93%
    09/07/02 15:44:34 INFO mapred.JobClient: map 100% reduce 94%
    09/07/02 15:45:41 INFO mapred.JobClient: map 100% reduce 95%
    09/07/02 15:45:50 INFO mapred.JobClient: map 100% reduce 96%
    09/07/02 15:46:17 INFO mapred.JobClient: map 100% reduce 98%
    09/07/02 15:55:29 INFO mapred.JobClient: map 100% reduce 99%
    09/07/02 15:57:08 INFO mapred.JobClient: map 100% reduce 100%
    09/07/02 15:57:14 INFO mapred.JobClient: Job complete:
    job_200906192236_5114
    09/07/02 15:57:14 INFO mapred.JobClient: Counters: 18
    09/07/02 15:57:14 INFO mapred.JobClient: Job Counters
    09/07/02 15:57:14 INFO mapred.JobClient: Launched reduce tasks=24
    09/07/02 15:57:14 INFO mapred.JobClient: Rack-local map tasks=2
    09/07/02 15:57:14 INFO mapred.JobClient: Launched map tasks=20
    09/07/02 15:57:14 INFO mapred.JobClient: Data-local map tasks=18
    09/07/02 15:57:14 INFO mapred.JobClient: FileSystemCounters
    09/07/02 15:57:14 INFO mapred.JobClient:
    FILE_BYTES_READ=1848609562
    09/07/02 15:57:14 INFO mapred.JobClient: HDFS_BYTES_READ=57982980
    09/07/02 15:57:14 INFO mapred.JobClient:
    FILE_BYTES_WRITTEN=2768325646
    09/07/02 15:57:14 INFO mapred.JobClient: Map-Reduce Framework
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input groups=4863
    09/07/02 15:57:14 INFO mapred.JobClient: Combine output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map input records=294786
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce shuffle
    bytes=883803390
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce output records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Spilled Records=50956464
    09/07/02 15:57:14 INFO mapred.JobClient: Map output
    bytes=888797024
    09/07/02 15:57:14 INFO mapred.JobClient: Map input bytes=57966580
    09/07/02 15:57:14 INFO mapred.JobClient: Combine input records=0
    09/07/02 15:57:14 INFO mapred.JobClient: Map output
    records=16985488
    09/07/02 15:57:14 INFO mapred.JobClient: Reduce input
    records=16985488
  • Stack at Jul 6, 2009 at 6:15 pm

    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed wrote:

    I am working on writing to HDFS files. Will update you by end of day today.

    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Irfan Mohammed at Jul 6, 2009 at 6:25 pm
    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me, it looks like it is always doing 1 region per table and these tables are always on different regionservers. I never seen the same table on different regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed wrote:

    I am working on writing to HDFS files. Will update you by end of day today.

    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Stack at Jul 6, 2009 at 6:36 pm
    Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one
    region in each table. Your cells are small so thats probably about right.

    So, an hbase client is contacting 4 different servers to do each update.
    And running with one table made no difference to overall time?

    St.Ack
    On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed wrote:

    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me,
    it looks like it is always doing 1 region per table and these tables are
    always on different regionservers. I never seen the same table on different
    regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed wrote:

    I am working on writing to HDFS files. Will update you by end of day today.
    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Irfan Mohammed at Jul 6, 2009 at 6:44 pm
    i ran the following [ note : tables t1, txn_m5, txn_m6, txn are unused for now ]

    hbase(main):002:0> status 'detailed'
    09/07/06 14:29:32 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16:2181,app48:2181,app122:2181
    version 0.20.0-dev
    5 live servers
    app16:60020 1246848846822
    requests=0, regions=2, usedHeap=65, maxHeap=963
    txn_m5,,1245462915569
    stores=18, storefiles=17, memcacheSize=0, storefileIndexSize=0
    txn_m4,,1245462942282
    stores=18, storefiles=34, memcacheSize=6, storefileIndexSize=0
    app03:60020 1246848846821
    requests=0, regions=2, usedHeap=36, maxHeap=963
    txn,,1246633495833
    stores=72, storefiles=60, memcacheSize=0, storefileIndexSize=0
    txn_m3,,1245462890248
    stores=18, storefiles=33, memcacheSize=2, storefileIndexSize=0
    app48:60020 1246848846825
    requests=0, regions=1, usedHeap=25, maxHeap=963
    -ROOT-,,0
    stores=1, storefiles=3, memcacheSize=0, storefileIndexSize=0
    app01:60020 1246848846823
    requests=0, regions=3, usedHeap=173, maxHeap=963
    .META.,,1
    stores=2, storefiles=3, memcacheSize=0, storefileIndexSize=0
    txn_m2,,1245460727203
    stores=18, storefiles=34, memcacheSize=26, storefileIndexSize=0
    txn_m6,,1245462928979
    stores=18, storefiles=17, memcacheSize=0, storefileIndexSize=0
    app122:60020 1246848846826
    requests=0, regions=2, usedHeap=148, maxHeap=963
    t1,,1245458736498
    stores=1, storefiles=1, memcacheSize=0, storefileIndexSize=0
    txn_m1,,1245462904101
    stores=18, storefiles=31, memcacheSize=17, storefileIndexSize=0
    0 dead servers
    hbase(main):003:0>


    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:24:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me, it looks like it is always doing 1 region per table and these tables are always on different regionservers. I never seen the same table on different regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed wrote:

    I am working on writing to HDFS files. Will update you by end of day today.

    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Irfan Mohammed at Jul 6, 2009 at 7:57 pm
    Writing to hdfs directly took just 21 seconds. So I am suspecting that there is something that I am doing incorrectly in my hbase setup or my code.

    Thanks for the help.

    [2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat: Total input paths to process : 10
    09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235
    09/07/06 15:52:23 INFO mapred.JobClient: map 0% reduce 0%
    09/07/06 15:52:37 INFO mapred.JobClient: map 7% reduce 0%
    09/07/06 15:52:43 INFO mapred.JobClient: map 100% reduce 0%
    09/07/06 15:52:47 INFO mapred.JobClient: Job complete: job_200907052205_0235
    09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9
    09/07/06 15:52:47 INFO mapred.JobClient: Job Counters
    09/07/06 15:52:47 INFO mapred.JobClient: Rack-local map tasks=4
    09/07/06 15:52:47 INFO mapred.JobClient: Launched map tasks=10
    09/07/06 15:52:47 INFO mapred.JobClient: Data-local map tasks=6
    09/07/06 15:52:47 INFO mapred.JobClient: FileSystemCounters
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=587539988
    09/07/06 15:52:47 INFO mapred.JobClient: Map-Reduce Framework
    09/07/06 15:52:47 INFO mapred.JobClient: Map input records=294786
    09/07/06 15:52:47 INFO mapred.JobClient: Spilled Records=0
    09/07/06 15:52:47 INFO mapred.JobClient: Map input bytes=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: Map output records=1160144

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one
    region in each table. Your cells are small so thats probably about right.

    So, an hbase client is contacting 4 different servers to do each update.
    And running with one table made no difference to overall time?

    St.Ack
    On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed wrote:

    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me,
    it looks like it is always doing 1 region per table and these tables are
    always on different regionservers. I never seen the same table on different
    regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed wrote:

    I am working on writing to HDFS files. Will update you by end of day today.
    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Irfan Mohammed at Jul 6, 2009 at 10:42 pm
    converted the code to just directly use HBase Client API without the M/R framework and the results are interesting ...

    1. initially did not use "HTable.incrementColumnValue" and just used "HTable.put" and the process ran in ~5 minutes.
    2. after switching to "HTable.incrementColumnValue" it is still running and is about ~30 minutes into the run. I issued couple of "kill -QUIT" to see if the process is moving ahead and looks like it is since the lock object is changing each time.
    HTable.Put >>>>>>>>>>>>>>>>>>>>>
    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 17:58:43 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:host.name=app48
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_13
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.name=qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Initiating client connection, host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Attempting connection to server app122.qwapi.com/10.10.0.122:2181
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.0.48:35809 remote=app122.qwapi.com/10.10.0.122:2181]
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 17:58:43.425] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...
    [2009-07-06 18:03:46.104] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] completed. # of records processed : [294,786]
    HTable.Put >>>>>>>>>>>>>>>>>>>>>
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>
    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 18:07:12 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:host.name=app48
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_13
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.name=qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Initiating client connection, host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Attempting connection to server app122.qwapi.com/10.10.0.122:2181
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.0.48:36147 remote=app122.qwapi.com/10.10.0.122:2181]
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 18:07:12.735] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...



    2009-07-06 18:23:24
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedf2a8c8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 3129K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 22% used [0xedc40000,0xedf4a4b0,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1795K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fc0d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:24:59
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c in Object.wait() [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.waitForWork(HBaseClient.java:401)
    - locked <0xb4e00090> (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:444)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee5ecb50> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10004K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 71% used [0xedc40000,0xee601028,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1907K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fdcd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:30:39
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee61dfe8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10281K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 73% used [0xedc40000,0xee6464f0,0xee9f0000)
    from space 64K, 25% used [0xee9f0000,0xee9f4000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2315K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb5042d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:31:13
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:80)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:513)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedd8dec0> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 1448K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 10% used [0xedc40000,0xedda2018,0xee9f0000)
    from space 64K, 50% used [0xee9f0000,0xee9f8000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2359K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb504dd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>
    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 3:56:57 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Writing to hdfs directly took just 21 seconds. So I am suspecting that there is something that I am doing incorrectly in my hbase setup or my code.

    Thanks for the help.

    [2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat: Total input paths to process : 10
    09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235
    09/07/06 15:52:23 INFO mapred.JobClient: map 0% reduce 0%
    09/07/06 15:52:37 INFO mapred.JobClient: map 7% reduce 0%
    09/07/06 15:52:43 INFO mapred.JobClient: map 100% reduce 0%
    09/07/06 15:52:47 INFO mapred.JobClient: Job complete: job_200907052205_0235
    09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9
    09/07/06 15:52:47 INFO mapred.JobClient: Job Counters
    09/07/06 15:52:47 INFO mapred.JobClient: Rack-local map tasks=4
    09/07/06 15:52:47 INFO mapred.JobClient: Launched map tasks=10
    09/07/06 15:52:47 INFO mapred.JobClient: Data-local map tasks=6
    09/07/06 15:52:47 INFO mapred.JobClient: FileSystemCounters
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=587539988
    09/07/06 15:52:47 INFO mapred.JobClient: Map-Reduce Framework
    09/07/06 15:52:47 INFO mapred.JobClient: Map input records=294786
    09/07/06 15:52:47 INFO mapred.JobClient: Spilled Records=0
    09/07/06 15:52:47 INFO mapred.JobClient: Map input bytes=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: Map output records=1160144

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one
    region in each table. Your cells are small so thats probably about right.

    So, an hbase client is contacting 4 different servers to do each update.
    And running with one table made no difference to overall time?

    St.Ack
    On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed wrote:

    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me,
    it looks like it is always doing 1 region per table and these tables are
    always on different regionservers. I never seen the same table on different
    regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed wrote:

    I am working on writing to HDFS files. Will update you by end of day today.
    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Irfan Mohammed at Jul 6, 2009 at 11:51 pm
    added more instrumentation. it is taking about 2 minutes per 10k records and for 300k records it will take 60 minutes. :-(

    [qwapi@app48 logs]$ ~/scripts/loadDirect.sh
    [2009-07-06 19:29:20.465] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...
    [2009-07-06 19:29:21.820] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [100] records
    [2009-07-06 19:29:23.372] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [200] records
    [2009-07-06 19:29:24.567] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [300] records
    [2009-07-06 19:29:25.157] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [400] records
    [2009-07-06 19:29:26.178] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [500] records
    [2009-07-06 19:29:27.096] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [600] records
    [2009-07-06 19:29:28.249] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [700] records
    [2009-07-06 19:29:28.258] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [800] records
    [2009-07-06 19:29:28.267] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [900] records
    [2009-07-06 19:29:28.276] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,000] records
    [2009-07-06 19:29:29.406] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,100] records
    [2009-07-06 19:29:30.094] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,200] records
    [2009-07-06 19:29:30.903] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,300] records
    [2009-07-06 19:29:32.158] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,400] records
    [2009-07-06 19:29:33.483] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,500] records
    [2009-07-06 19:29:34.187] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,600] records
    [2009-07-06 19:29:35.515] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,700] records
    [2009-07-06 19:29:36.610] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,800] records
    [2009-07-06 19:29:37.758] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,900] records
    [2009-07-06 19:29:39.173] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,000] records
    [2009-07-06 19:29:40.443] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,100] records
    [2009-07-06 19:29:41.848] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,200] records
    [2009-07-06 19:29:42.256] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,300] records
    [2009-07-06 19:29:43.520] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,400] records
    [2009-07-06 19:29:44.906] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,500] records
    [2009-07-06 19:29:46.191] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,600] records
    [2009-07-06 19:29:47.502] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,700] records
    [2009-07-06 19:29:48.810] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,800] records
    [2009-07-06 19:29:50.275] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,900] records
    [2009-07-06 19:29:51.579] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,000] records
    [2009-07-06 19:29:52.879] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,100] records
    [2009-07-06 19:29:54.207] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,200] records
    [2009-07-06 19:29:55.619] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,300] records
    [2009-07-06 19:29:56.901] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,400] records
    [2009-07-06 19:29:58.183] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,500] records
    [2009-07-06 19:29:59.555] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,600] records
    [2009-07-06 19:30:00.838] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,700] records
    [2009-07-06 19:30:02.232] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,800] records

    [2009-07-06 19:31:18.371] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [9,900] records
    [2009-07-06 19:31:19.672] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [10,000] records

    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 6:42:10 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    converted the code to just directly use HBase Client API without the M/R framework and the results are interesting ...

    1. initially did not use "HTable.incrementColumnValue" and just used "HTable.put" and the process ran in ~5 minutes.
    2. after switching to "HTable.incrementColumnValue" it is still running and is about ~30 minutes into the run. I issued couple of "kill -QUIT" to see if the process is moving ahead and looks like it is since the lock object is changing each time.
    HTable.Put >>>>>>>>>>>>>>>>>>>>>
    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 17:58:43 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:host.name=app48
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_13
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.name=qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Initiating client connection, host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Attempting connection to server app122.qwapi.com/10.10.0.122:2181
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.0.48:35809 remote=app122.qwapi.com/10.10.0.122:2181]
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 17:58:43.425] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...
    [2009-07-06 18:03:46.104] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] completed. # of records processed : [294,786]
    HTable.Put >>>>>>>>>>>>>>>>>>>>>
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>
    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 18:07:12 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:host.name=app48
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_13
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.name=qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Initiating client connection, host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Attempting connection to server app122.qwapi.com/10.10.0.122:2181
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.0.48:36147 remote=app122.qwapi.com/10.10.0.122:2181]
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 18:07:12.735] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...



    2009-07-06 18:23:24
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedf2a8c8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 3129K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 22% used [0xedc40000,0xedf4a4b0,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1795K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fc0d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:24:59
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c in Object.wait() [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.waitForWork(HBaseClient.java:401)
    - locked <0xb4e00090> (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:444)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee5ecb50> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10004K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 71% used [0xedc40000,0xee601028,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1907K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fdcd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:30:39
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee61dfe8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10281K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 73% used [0xedc40000,0xee6464f0,0xee9f0000)
    from space 64K, 25% used [0xee9f0000,0xee9f4000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2315K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb5042d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:31:13
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:80)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:513)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedd8dec0> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 1448K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 10% used [0xedc40000,0xedda2018,0xee9f0000)
    from space 64K, 50% used [0xee9f0000,0xee9f8000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2359K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb504dd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>
    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 3:56:57 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Writing to hdfs directly took just 21 seconds. So I am suspecting that there is something that I am doing incorrectly in my hbase setup or my code.

    Thanks for the help.

    [2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat: Total input paths to process : 10
    09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235
    09/07/06 15:52:23 INFO mapred.JobClient: map 0% reduce 0%
    09/07/06 15:52:37 INFO mapred.JobClient: map 7% reduce 0%
    09/07/06 15:52:43 INFO mapred.JobClient: map 100% reduce 0%
    09/07/06 15:52:47 INFO mapred.JobClient: Job complete: job_200907052205_0235
    09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9
    09/07/06 15:52:47 INFO mapred.JobClient: Job Counters
    09/07/06 15:52:47 INFO mapred.JobClient: Rack-local map tasks=4
    09/07/06 15:52:47 INFO mapred.JobClient: Launched map tasks=10
    09/07/06 15:52:47 INFO mapred.JobClient: Data-local map tasks=6
    09/07/06 15:52:47 INFO mapred.JobClient: FileSystemCounters
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=587539988
    09/07/06 15:52:47 INFO mapred.JobClient: Map-Reduce Framework
    09/07/06 15:52:47 INFO mapred.JobClient: Map input records=294786
    09/07/06 15:52:47 INFO mapred.JobClient: Spilled Records=0
    09/07/06 15:52:47 INFO mapred.JobClient: Map input bytes=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: Map output records=1160144

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one
    region in each table. Your cells are small so thats probably about right.

    So, an hbase client is contacting 4 different servers to do each update.
    And running with one table made no difference to overall time?

    St.Ack
    On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed wrote:

    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me,
    it looks like it is always doing 1 region per table and these tables are
    always on different regionservers. I never seen the same table on different
    regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed wrote:

    I am working on writing to HDFS files. Will update you by end of day today.
    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Irfan Mohammed at Jul 7, 2009 at 12:19 am
    With a single family, ICV finished in 4:21 minutes. So it is a limitation of how many families are there in the mix. Need to re-think the schema ... :-(

    [qwapi@app48 logs]$ ~/scripts/loadDirect.sh
    [2009-07-06 20:12:23.542] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...
    [2009-07-06 20:12:32.895] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [10,000] records
    [2009-07-06 20:12:42.198] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [20,000] records
    [2009-07-06 20:12:50.956] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [30,000] records
    [2009-07-06 20:12:59.087] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [40,000] records
    [2009-07-06 20:13:08.258] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [50,000] records
    [2009-07-06 20:13:16.773] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [60,000] records
    [2009-07-06 20:13:25.128] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [70,000] records
    [2009-07-06 20:13:34.309] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [80,000] records
    [2009-07-06 20:13:42.845] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [90,000] records
    [2009-07-06 20:13:51.363] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [100,000] records
    [2009-07-06 20:14:00.627] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [110,000] records
    [2009-07-06 20:14:08.964] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [120,000] records
    [2009-07-06 20:14:17.896] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [130,000] records
    [2009-07-06 20:14:27.680] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [140,000] records
    [2009-07-06 20:14:36.821] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [150,000] records
    [2009-07-06 20:14:45.966] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [160,000] records
    [2009-07-06 20:14:54.911] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [170,000] records
    [2009-07-06 20:15:03.736] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [180,000] records
    [2009-07-06 20:15:12.037] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [190,000] records
    [2009-07-06 20:15:20.494] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [200,000] records
    [2009-07-06 20:15:29.216] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [210,000] records
    [2009-07-06 20:15:37.809] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [220,000] records
    [2009-07-06 20:15:46.811] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [230,000] records
    [2009-07-06 20:15:55.512] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [240,000] records
    [2009-07-06 20:16:03.961] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [250,000] records
    [2009-07-06 20:16:12.933] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [260,000] records
    [2009-07-06 20:16:21.934] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [270,000] records
    [2009-07-06 20:16:30.435] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [280,000] records
    [2009-07-06 20:16:39.882] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [290,000] records
    [2009-07-06 20:16:44.573] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] completed. # of records processed : [294,786]. Start Time : [2009-07-06 20:12:23], End Time : [2009-07-06 20:16:44]
    [qwapi@app48 logs]$


    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 7:51:00 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    added more instrumentation. it is taking about 2 minutes per 10k records and for 300k records it will take 60 minutes. :-(

    [qwapi@app48 logs]$ ~/scripts/loadDirect.sh
    [2009-07-06 19:29:20.465] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...
    [2009-07-06 19:29:21.820] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [100] records
    [2009-07-06 19:29:23.372] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [200] records
    [2009-07-06 19:29:24.567] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [300] records
    [2009-07-06 19:29:25.157] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [400] records
    [2009-07-06 19:29:26.178] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [500] records
    [2009-07-06 19:29:27.096] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [600] records
    [2009-07-06 19:29:28.249] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [700] records
    [2009-07-06 19:29:28.258] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [800] records
    [2009-07-06 19:29:28.267] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [900] records
    [2009-07-06 19:29:28.276] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,000] records
    [2009-07-06 19:29:29.406] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,100] records
    [2009-07-06 19:29:30.094] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,200] records
    [2009-07-06 19:29:30.903] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,300] records
    [2009-07-06 19:29:32.158] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,400] records
    [2009-07-06 19:29:33.483] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,500] records
    [2009-07-06 19:29:34.187] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,600] records
    [2009-07-06 19:29:35.515] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,700] records
    [2009-07-06 19:29:36.610] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,800] records
    [2009-07-06 19:29:37.758] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [1,900] records
    [2009-07-06 19:29:39.173] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,000] records
    [2009-07-06 19:29:40.443] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,100] records
    [2009-07-06 19:29:41.848] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,200] records
    [2009-07-06 19:29:42.256] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,300] records
    [2009-07-06 19:29:43.520] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,400] records
    [2009-07-06 19:29:44.906] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,500] records
    [2009-07-06 19:29:46.191] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,600] records
    [2009-07-06 19:29:47.502] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,700] records
    [2009-07-06 19:29:48.810] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,800] records
    [2009-07-06 19:29:50.275] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [2,900] records
    [2009-07-06 19:29:51.579] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,000] records
    [2009-07-06 19:29:52.879] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,100] records
    [2009-07-06 19:29:54.207] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,200] records
    [2009-07-06 19:29:55.619] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,300] records
    [2009-07-06 19:29:56.901] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,400] records
    [2009-07-06 19:29:58.183] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,500] records
    [2009-07-06 19:29:59.555] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,600] records
    [2009-07-06 19:30:00.838] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,700] records
    [2009-07-06 19:30:02.232] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [3,800] records

    [2009-07-06 19:31:18.371] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [9,900] records
    [2009-07-06 19:31:19.672] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]. [10,000] records

    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 6:42:10 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    converted the code to just directly use HBase Client API without the M/R framework and the results are interesting ...

    1. initially did not use "HTable.incrementColumnValue" and just used "HTable.put" and the process ran in ~5 minutes.
    2. after switching to "HTable.incrementColumnValue" it is still running and is about ~30 minutes into the run. I issued couple of "kill -QUIT" to see if the process is moving ahead and looks like it is since the lock object is changing each time.
    HTable.Put >>>>>>>>>>>>>>>>>>>>>
    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 17:58:43 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:host.name=app48
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_13
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.name=qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Initiating client connection, host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Attempting connection to server app122.qwapi.com/10.10.0.122:2181
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.0.48:35809 remote=app122.qwapi.com/10.10.0.122:2181]
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 17:58:43.425] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...
    [2009-07-06 18:03:46.104] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] completed. # of records processed : [294,786]
    HTable.Put >>>>>>>>>>>>>>>>>>>>>
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>
    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 18:07:12 INFO zookeeper.ZooKeeperWrapper: Quorum servers: app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:host.name=app48
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_13
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.name=qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Initiating client connection, host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Attempting connection to server app122.qwapi.com/10.10.0.122:2181
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.0.48:36147 remote=app122.qwapi.com/10.10.0.122:2181]
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 18:07:12.735] processing file : [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV] ...



    2009-07-06 18:23:24
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedf2a8c8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 3129K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 22% used [0xedc40000,0xedf4a4b0,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1795K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fc0d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:24:59
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c in Object.wait() [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.waitForWork(HBaseClient.java:401)
    - locked <0xb4e00090> (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:444)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee5ecb50> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10004K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 71% used [0xedc40000,0xee601028,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1907K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fdcd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:30:39
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee61dfe8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10281K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 73% used [0xedc40000,0xee6464f0,0xee9f0000)
    from space 64K, 25% used [0xee9f0000,0xee9f4000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2315K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb5042d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:31:13
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user" daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:80)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:513)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait() [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait() [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedd8dec0> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 1448K [0xedc40000, 0xeea10000, 0xf4e00000)
    eden space 14016K, 10% used [0xedc40000,0xedda2018,0xee9f0000)
    from space 64K, 50% used [0xee9f0000,0xee9f8000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2359K [0xb4e00000, 0xbbcd0000, 0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb504dd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000, 0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>
    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 3:56:57 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Writing to hdfs directly took just 21 seconds. So I am suspecting that there is something that I am doing incorrectly in my hbase setup or my code.

    Thanks for the help.

    [2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat: Total input paths to process : 10
    09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235
    09/07/06 15:52:23 INFO mapred.JobClient: map 0% reduce 0%
    09/07/06 15:52:37 INFO mapred.JobClient: map 7% reduce 0%
    09/07/06 15:52:43 INFO mapred.JobClient: map 100% reduce 0%
    09/07/06 15:52:47 INFO mapred.JobClient: Job complete: job_200907052205_0235
    09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9
    09/07/06 15:52:47 INFO mapred.JobClient: Job Counters
    09/07/06 15:52:47 INFO mapred.JobClient: Rack-local map tasks=4
    09/07/06 15:52:47 INFO mapred.JobClient: Launched map tasks=10
    09/07/06 15:52:47 INFO mapred.JobClient: Data-local map tasks=6
    09/07/06 15:52:47 INFO mapred.JobClient: FileSystemCounters
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=587539988
    09/07/06 15:52:47 INFO mapred.JobClient: Map-Reduce Framework
    09/07/06 15:52:47 INFO mapred.JobClient: Map input records=294786
    09/07/06 15:52:47 INFO mapred.JobClient: Spilled Records=0
    09/07/06 15:52:47 INFO mapred.JobClient: Map input bytes=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: Map output records=1160144

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one
    region in each table. Your cells are small so thats probably about right.

    So, an hbase client is contacting 4 different servers to do each update.
    And running with one table made no difference to overall time?

    St.Ack
    On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed wrote:

    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me,
    it looks like it is always doing 1 region per table and these tables are
    always on different regionservers. I never seen the same table on different
    regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help
    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed wrote:

    I am working on writing to HDFS files. Will update you by end of day today.
    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Stack at Jul 7, 2009 at 12:48 am
    Is this single threaded uploader Irfan? 4.21minutes is still not fast
    enough, right?
    St.Ack
    On Mon, Jul 6, 2009 at 5:19 PM, Irfan Mohammed wrote:

    With a single family, ICV finished in 4:21 minutes. So it is a limitation
    of how many families are there in the mix. Need to re-think the schema ...
    :-(

    [qwapi@app48 logs]$ ~/scripts/loadDirect.sh
    [2009-07-06 20:12:23.542] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    ...
    [2009-07-06 20:12:32.895] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [10,000] records
    [2009-07-06 20:12:42.198] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [20,000] records
    [2009-07-06 20:12:50.956] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [30,000] records
    [2009-07-06 20:12:59.087] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [40,000] records
    [2009-07-06 20:13:08.258] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [50,000] records
    [2009-07-06 20:13:16.773] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [60,000] records
    [2009-07-06 20:13:25.128] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [70,000] records
    [2009-07-06 20:13:34.309] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [80,000] records
    [2009-07-06 20:13:42.845] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [90,000] records
    [2009-07-06 20:13:51.363] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [100,000] records
    [2009-07-06 20:14:00.627] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [110,000] records
    [2009-07-06 20:14:08.964] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [120,000] records
    [2009-07-06 20:14:17.896] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [130,000] records
    [2009-07-06 20:14:27.680] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [140,000] records
    [2009-07-06 20:14:36.821] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [150,000] records
    [2009-07-06 20:14:45.966] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [160,000] records
    [2009-07-06 20:14:54.911] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [170,000] records
    [2009-07-06 20:15:03.736] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [180,000] records
    [2009-07-06 20:15:12.037] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [190,000] records
    [2009-07-06 20:15:20.494] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [200,000] records
    [2009-07-06 20:15:29.216] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [210,000] records
    [2009-07-06 20:15:37.809] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [220,000] records
    [2009-07-06 20:15:46.811] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [230,000] records
    [2009-07-06 20:15:55.512] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [240,000] records
    [2009-07-06 20:16:03.961] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [250,000] records
    [2009-07-06 20:16:12.933] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [260,000] records
    [2009-07-06 20:16:21.934] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [270,000] records
    [2009-07-06 20:16:30.435] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [280,000] records
    [2009-07-06 20:16:39.882] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [290,000] records
    [2009-07-06 20:16:44.573] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    completed. # of records processed : [294,786]. Start Time : [2009-07-06
    20:12:23], End Time : [2009-07-06 20:16:44]
    [qwapi@app48 logs]$


    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 7:51:00 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    added more instrumentation. it is taking about 2 minutes per 10k records
    and for 300k records it will take 60 minutes. :-(

    [qwapi@app48 logs]$ ~/scripts/loadDirect.sh
    [2009-07-06 19:29:20.465] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    ...
    [2009-07-06 19:29:21.820] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [100] records
    [2009-07-06 19:29:23.372] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [200] records
    [2009-07-06 19:29:24.567] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [300] records
    [2009-07-06 19:29:25.157] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [400] records
    [2009-07-06 19:29:26.178] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [500] records
    [2009-07-06 19:29:27.096] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [600] records
    [2009-07-06 19:29:28.249] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [700] records
    [2009-07-06 19:29:28.258] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [800] records
    [2009-07-06 19:29:28.267] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [900] records
    [2009-07-06 19:29:28.276] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,000] records
    [2009-07-06 19:29:29.406] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,100] records
    [2009-07-06 19:29:30.094] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,200] records
    [2009-07-06 19:29:30.903] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,300] records
    [2009-07-06 19:29:32.158] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,400] records
    [2009-07-06 19:29:33.483] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,500] records
    [2009-07-06 19:29:34.187] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,600] records
    [2009-07-06 19:29:35.515] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,700] records
    [2009-07-06 19:29:36.610] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,800] records
    [2009-07-06 19:29:37.758] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,900] records
    [2009-07-06 19:29:39.173] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,000] records
    [2009-07-06 19:29:40.443] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,100] records
    [2009-07-06 19:29:41.848] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,200] records
    [2009-07-06 19:29:42.256] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,300] records
    [2009-07-06 19:29:43.520] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,400] records
    [2009-07-06 19:29:44.906] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,500] records
    [2009-07-06 19:29:46.191] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,600] records
    [2009-07-06 19:29:47.502] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,700] records
    [2009-07-06 19:29:48.810] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,800] records
    [2009-07-06 19:29:50.275] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,900] records
    [2009-07-06 19:29:51.579] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,000] records
    [2009-07-06 19:29:52.879] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,100] records
    [2009-07-06 19:29:54.207] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,200] records
    [2009-07-06 19:29:55.619] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,300] records
    [2009-07-06 19:29:56.901] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,400] records
    [2009-07-06 19:29:58.183] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,500] records
    [2009-07-06 19:29:59.555] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,600] records
    [2009-07-06 19:30:00.838] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,700] records
    [2009-07-06 19:30:02.232] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,800] records

    [2009-07-06 19:31:18.371] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [9,900] records
    [2009-07-06 19:31:19.672] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [10,000] records

    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 6:42:10 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    converted the code to just directly use HBase Client API without the M/R
    framework and the results are interesting ...

    1. initially did not use "HTable.incrementColumnValue" and just used
    "HTable.put" and the process ran in ~5 minutes.
    2. after switching to "HTable.incrementColumnValue" it is still running and
    is about ~30 minutes into the run. I issued couple of "kill -QUIT" to see if
    the process is moving ahead and looks like it is since the lock object is
    changing each time.
    HTable.Put >>>>>>>>>>>>>>>>>>>>>

    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 17:58:43 INFO zookeeper.ZooKeeperWrapper: Quorum servers:
    app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:host.name
    =app48
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.version=1.6.0_13
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.vendor=Sun Microsystems Inc.
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.io.tmpdir=/tmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.compiler=<NA>
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.name
    =Linux
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.name
    =qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:user.home=/home/qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Initiating client connection,
    host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181sessionTimeout=10000
    watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn:
    zookeeper.disableAutoWatchReset is false
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Attempting connection to
    server app122.qwapi.com/10.10.0.122:2181
    09/07/06 <http://app122.qwapi.com/10.10.0.122:2181%0A09/07/06> 17:58:43
    INFO zookeeper.ClientCnxn: Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.10.0.48:35809 remote=
    app122.qwapi.com/10.10.0.122:2181]
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 17:58:43.425] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    ...
    [2009-07-06 18:03:46.104] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    completed. # of records processed : [294,786]
    HTable.Put >>>>>>>>>>>>>>>>>>>>>
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>

    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 18:07:12 INFO zookeeper.ZooKeeperWrapper: Quorum servers:
    app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:host.name
    =app48
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.version=1.6.0_13
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.vendor=Sun Microsystems Inc.
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.io.tmpdir=/tmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.compiler=<NA>
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.name
    =Linux
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.name
    =qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:user.home=/home/qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Initiating client connection,
    host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181sessionTimeout=10000
    watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn:
    zookeeper.disableAutoWatchReset is false
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Attempting connection to
    server app122.qwapi.com/10.10.0.122:2181
    09/07/06 <http://app122.qwapi.com/10.10.0.122:2181%0A09/07/06> 18:07:12
    INFO zookeeper.ClientCnxn: Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.10.0.48:36147 remote=
    app122.qwapi.com/10.10.0.122:2181]
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 18:07:12.735] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    ...



    2009-07-06 18:23:24
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user"
    daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at
    org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on
    condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at
    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at
    org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable
    [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable
    [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on
    condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on
    condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on
    condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait()
    [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in
    Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait()
    [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedf2a8c8> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at
    org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on
    condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 3129K [0xedc40000, 0xeea10000,
    0xf4e00000)
    eden space 14016K, 22% used [0xedc40000,0xedf4a4b0,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1795K [0xb4e00000, 0xbbcd0000,
    0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fc0d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000,
    0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:24:59
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user"
    daemon prio=10 tid=0xafa1d000 nid=0xd5c in Object.wait()
    [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.waitForWork(HBaseClient.java:401)
    - locked <0xb4e00090> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:444)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on
    condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at
    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at
    org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable
    [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable
    [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on
    condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on
    condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on
    condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait()
    [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in
    Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait()
    [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee5ecb50> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at
    org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on
    condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10004K [0xedc40000, 0xeea10000,
    0xf4e00000)
    eden space 14016K, 71% used [0xedc40000,0xee601028,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1907K [0xb4e00000, 0xbbcd0000,
    0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fdcd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000,
    0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:30:39
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user"
    daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at
    org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on
    condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at
    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at
    org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable
    [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable
    [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on
    condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on
    condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on
    condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait()
    [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in
    Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait()
    [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee61dfe8> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at
    org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on
    condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10281K [0xedc40000, 0xeea10000,
    0xf4e00000)
    eden space 14016K, 73% used [0xedc40000,0xee6464f0,0xee9f0000)
    from space 64K, 25% used [0xee9f0000,0xee9f4000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2315K [0xb4e00000, 0xbbcd0000,
    0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb5042d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000,
    0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:31:13
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user"
    daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:80)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:513)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on
    condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at
    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at
    org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable
    [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable
    [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on
    condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on
    condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on
    condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait()
    [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in
    Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait()
    [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedd8dec0> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at
    org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on
    condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 1448K [0xedc40000, 0xeea10000,
    0xf4e00000)
    eden space 14016K, 10% used [0xedc40000,0xedda2018,0xee9f0000)
    from space 64K, 50% used [0xee9f0000,0xee9f8000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2359K [0xb4e00000, 0xbbcd0000,
    0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb504dd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000,
    0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>

    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 3:56:57 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Writing to hdfs directly took just 21 seconds. So I am suspecting that
    there is something that I am doing incorrectly in my hbase setup or my code.

    Thanks for the help.

    [2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat:
    Total input paths to process : 10
    09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235
    09/07/06 15:52:23 INFO mapred.JobClient: map 0% reduce 0%
    09/07/06 15:52:37 INFO mapred.JobClient: map 7% reduce 0%
    09/07/06 15:52:43 INFO mapred.JobClient: map 100% reduce 0%
    09/07/06 15:52:47 INFO mapred.JobClient: Job complete:
    job_200907052205_0235
    09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9
    09/07/06 15:52:47 INFO mapred.JobClient: Job Counters
    09/07/06 15:52:47 INFO mapred.JobClient: Rack-local map tasks=4
    09/07/06 15:52:47 INFO mapred.JobClient: Launched map tasks=10
    09/07/06 15:52:47 INFO mapred.JobClient: Data-local map tasks=6
    09/07/06 15:52:47 INFO mapred.JobClient: FileSystemCounters
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=587539988
    09/07/06 15:52:47 INFO mapred.JobClient: Map-Reduce Framework
    09/07/06 15:52:47 INFO mapred.JobClient: Map input records=294786
    09/07/06 15:52:47 INFO mapred.JobClient: Spilled Records=0
    09/07/06 15:52:47 INFO mapred.JobClient: Map input bytes=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: Map output records=1160144

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one
    region in each table. Your cells are small so thats probably about right.

    So, an hbase client is contacting 4 different servers to do each update.
    And running with one table made no difference to overall time?

    St.Ack
    On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed wrote:

    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me,
    it looks like it is always doing 1 region per table and these tables are
    always on different regionservers. I never seen the same table on different
    regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed <irfan.ma@gmail.com>
    wrote:
    I am working on writing to HDFS files. Will update you by end of day today.
    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack
  • Irfan Mohammed at Jul 7, 2009 at 12:51 pm
    it is single threaded. i can change it to multiple threaded and see how it does.

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 8:47:39 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Is this single threaded uploader Irfan? 4.21minutes is still not fast
    enough, right?
    St.Ack
    On Mon, Jul 6, 2009 at 5:19 PM, Irfan Mohammed wrote:

    With a single family, ICV finished in 4:21 minutes. So it is a limitation
    of how many families are there in the mix. Need to re-think the schema ...
    :-(

    [qwapi@app48 logs]$ ~/scripts/loadDirect.sh
    [2009-07-06 20:12:23.542] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    ...
    [2009-07-06 20:12:32.895] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [10,000] records
    [2009-07-06 20:12:42.198] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [20,000] records
    [2009-07-06 20:12:50.956] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [30,000] records
    [2009-07-06 20:12:59.087] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [40,000] records
    [2009-07-06 20:13:08.258] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [50,000] records
    [2009-07-06 20:13:16.773] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [60,000] records
    [2009-07-06 20:13:25.128] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [70,000] records
    [2009-07-06 20:13:34.309] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [80,000] records
    [2009-07-06 20:13:42.845] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [90,000] records
    [2009-07-06 20:13:51.363] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [100,000] records
    [2009-07-06 20:14:00.627] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [110,000] records
    [2009-07-06 20:14:08.964] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [120,000] records
    [2009-07-06 20:14:17.896] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [130,000] records
    [2009-07-06 20:14:27.680] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [140,000] records
    [2009-07-06 20:14:36.821] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [150,000] records
    [2009-07-06 20:14:45.966] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [160,000] records
    [2009-07-06 20:14:54.911] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [170,000] records
    [2009-07-06 20:15:03.736] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [180,000] records
    [2009-07-06 20:15:12.037] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [190,000] records
    [2009-07-06 20:15:20.494] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [200,000] records
    [2009-07-06 20:15:29.216] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [210,000] records
    [2009-07-06 20:15:37.809] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [220,000] records
    [2009-07-06 20:15:46.811] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [230,000] records
    [2009-07-06 20:15:55.512] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [240,000] records
    [2009-07-06 20:16:03.961] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [250,000] records
    [2009-07-06 20:16:12.933] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [260,000] records
    [2009-07-06 20:16:21.934] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [270,000] records
    [2009-07-06 20:16:30.435] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [280,000] records
    [2009-07-06 20:16:39.882] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [290,000] records
    [2009-07-06 20:16:44.573] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    completed. # of records processed : [294,786]. Start Time : [2009-07-06
    20:12:23], End Time : [2009-07-06 20:16:44]
    [qwapi@app48 logs]$


    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 7:51:00 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    added more instrumentation. it is taking about 2 minutes per 10k records
    and for 300k records it will take 60 minutes. :-(

    [qwapi@app48 logs]$ ~/scripts/loadDirect.sh
    [2009-07-06 19:29:20.465] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    ...
    [2009-07-06 19:29:21.820] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [100] records
    [2009-07-06 19:29:23.372] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [200] records
    [2009-07-06 19:29:24.567] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [300] records
    [2009-07-06 19:29:25.157] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [400] records
    [2009-07-06 19:29:26.178] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [500] records
    [2009-07-06 19:29:27.096] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [600] records
    [2009-07-06 19:29:28.249] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [700] records
    [2009-07-06 19:29:28.258] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [800] records
    [2009-07-06 19:29:28.267] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [900] records
    [2009-07-06 19:29:28.276] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,000] records
    [2009-07-06 19:29:29.406] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,100] records
    [2009-07-06 19:29:30.094] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,200] records
    [2009-07-06 19:29:30.903] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,300] records
    [2009-07-06 19:29:32.158] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,400] records
    [2009-07-06 19:29:33.483] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,500] records
    [2009-07-06 19:29:34.187] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,600] records
    [2009-07-06 19:29:35.515] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,700] records
    [2009-07-06 19:29:36.610] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,800] records
    [2009-07-06 19:29:37.758] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [1,900] records
    [2009-07-06 19:29:39.173] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,000] records
    [2009-07-06 19:29:40.443] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,100] records
    [2009-07-06 19:29:41.848] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,200] records
    [2009-07-06 19:29:42.256] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,300] records
    [2009-07-06 19:29:43.520] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,400] records
    [2009-07-06 19:29:44.906] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,500] records
    [2009-07-06 19:29:46.191] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,600] records
    [2009-07-06 19:29:47.502] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,700] records
    [2009-07-06 19:29:48.810] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,800] records
    [2009-07-06 19:29:50.275] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [2,900] records
    [2009-07-06 19:29:51.579] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,000] records
    [2009-07-06 19:29:52.879] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,100] records
    [2009-07-06 19:29:54.207] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,200] records
    [2009-07-06 19:29:55.619] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,300] records
    [2009-07-06 19:29:56.901] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,400] records
    [2009-07-06 19:29:58.183] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,500] records
    [2009-07-06 19:29:59.555] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,600] records
    [2009-07-06 19:30:00.838] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,700] records
    [2009-07-06 19:30:02.232] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [3,800] records

    [2009-07-06 19:31:18.371] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [9,900] records
    [2009-07-06 19:31:19.672] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV].
    [10,000] records

    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 6:42:10 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    converted the code to just directly use HBase Client API without the M/R
    framework and the results are interesting ...

    1. initially did not use "HTable.incrementColumnValue" and just used
    "HTable.put" and the process ran in ~5 minutes.
    2. after switching to "HTable.incrementColumnValue" it is still running and
    is about ~30 minutes into the run. I issued couple of "kill -QUIT" to see if
    the process is moving ahead and looks like it is since the lock object is
    changing each time.
    HTable.Put >>>>>>>>>>>>>>>>>>>>>

    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 17:58:43 INFO zookeeper.ZooKeeperWrapper: Quorum servers:
    app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:host.name
    =app48
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.version=1.6.0_13
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.vendor=Sun Microsystems Inc.
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.io.tmpdir=/tmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:java.compiler=<NA>
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.name
    =Linux
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client environment:user.name
    =qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:user.home=/home/qwapi
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Client
    environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 17:58:43 INFO zookeeper.ZooKeeper: Initiating client connection,
    host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181sessionTimeout=10000
    watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn:
    zookeeper.disableAutoWatchReset is false
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Attempting connection to
    server app122.qwapi.com/10.10.0.122:2181
    09/07/06 <http://app122.qwapi.com/10.10.0.122:2181%0A09/07/06> 17:58:43
    INFO zookeeper.ClientCnxn: Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.10.0.48:35809 remote=
    app122.qwapi.com/10.10.0.122:2181]
    09/07/06 17:58:43 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 17:58:43.425] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    ...
    [2009-07-06 18:03:46.104] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    completed. # of records processed : [294,786]
    HTable.Put >>>>>>>>>>>>>>>>>>>>>
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>

    [qwapi@app48 transaction_ar20090706_1459.CSV]$ ~/scripts/loadDirect.sh
    09/07/06 18:07:12 INFO zookeeper.ZooKeeperWrapper: Quorum servers:
    app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:host.name
    =app48
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.version=1.6.0_13
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.vendor=Sun Microsystems Inc.
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.home=/usr/java/jdk1.6.0_13/jre
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.class.path=/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/apps/hbase-latest/lib/xmlenc-0.52.jar:/home/qwapi/apps/hbase-latest/lib/servlet-api-2.5-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/lucene-core-2.2.0.jar:/home/qwapi/apps/hbase-latest/lib/log4j-1.2.15.jar:/home/qwapi/apps/hbase-latest/lib/libthrift-r771587.jar:/home/qwapi/apps/hbase-latest/lib/junit-3.8.1.jar:/home/qwapi/apps/hbase-latest/lib/json.jar:/home/qwapi/apps/hbase-latest/lib/jruby-complete-1.2.0.jar:/home/qwapi/apps/hbase-latest/lib/jetty-util-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jetty-6.1.14.jar:/home/qwapi/apps/hbase-latest/lib/jasper-runtime-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/jasper-compiler-5.5.12.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-test.jar:/home/qwapi/apps/hbase-latest/lib/hadoop-0.20.0-plus4681-core.jar:/home/qwapi/apps/hbase-latest/lib/commons-math-1.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-api-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-logging-1.0.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-httpclient-3.0.1.jar:/home/qwapi/apps/hbase-latest/lib/commons-el-from-jetty-5.1.4.jar:/home/qwapi/apps/hbase-latest/lib/commons-cli-2.0-SNAPSHOT.jar:/home/qwapi/apps/hbase-latest/lib/AgileJSON-2009-03-30.jar:/home/qwapi/apps/hbase-latest/conf:/home/qwapi/apps/hadoop-latest/hadoop-0.20.0-core.jar:/home/qwapi/apps/hbase-latest/hbase-0.20.0-dev.jar:/home/qwapi/apps/hbase-latest/lib/zookeeper-r785019-hbase-1329.jar:/home/qwapi/txnload/bin/load_direct.jar
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/i386/server:/usr/java/jdk1.6.0_13/jre/lib/i386:/usr/java/jdk1.6.0_13/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.io.tmpdir=/tmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:java.compiler=<NA>
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.name
    =Linux
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:os.version=2.6.9-67.0.20.ELsmp
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client environment:user.name
    =qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:user.home=/home/qwapi
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Client
    environment:user.dir=/home/qwapi/tmp/transaction_ar20090706_1459.CSV
    09/07/06 18:07:12 INFO zookeeper.ZooKeeper: Initiating client connection,
    host=app16.qwapi.com:2181,app48.qwapi.com:2181,app122.qwapi.com:2181sessionTimeout=10000
    watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@fbb7cb
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn:
    zookeeper.disableAutoWatchReset is false
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Attempting connection to
    server app122.qwapi.com/10.10.0.122:2181
    09/07/06 <http://app122.qwapi.com/10.10.0.122:2181%0A09/07/06> 18:07:12
    INFO zookeeper.ClientCnxn: Priming connection to
    java.nio.channels.SocketChannel[connected local=/10.10.0.48:36147 remote=
    app122.qwapi.com/10.10.0.122:2181]
    09/07/06 18:07:12 INFO zookeeper.ClientCnxn: Server connection successful
    [2009-07-06 18:07:12.735] processing file :
    [/home/qwapi/tmp/transaction_ar20090706_1459.CSV/transaction_ar20090706_1459.CSV]
    ...



    2009-07-06 18:23:24
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user"
    daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at
    org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on
    condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at
    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at
    org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable
    [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable
    [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on
    condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on
    condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on
    condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait()
    [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in
    Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait()
    [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedf2a8c8> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at
    org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on
    condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 3129K [0xedc40000, 0xeea10000,
    0xf4e00000)
    eden space 14016K, 22% used [0xedc40000,0xedf4a4b0,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1795K [0xb4e00000, 0xbbcd0000,
    0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fc0d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000,
    0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:24:59
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user"
    daemon prio=10 tid=0xafa1d000 nid=0xd5c in Object.wait()
    [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.waitForWork(HBaseClient.java:401)
    - locked <0xb4e00090> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:444)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on
    condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at
    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at
    org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable
    [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable
    [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on
    condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on
    condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on
    condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait()
    [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in
    Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait()
    [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee5ecb50> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at
    org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on
    condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10004K [0xedc40000, 0xeea10000,
    0xf4e00000)
    eden space 14016K, 71% used [0xedc40000,0xee601028,0xee9f0000)
    from space 64K, 25% used [0xeea00000,0xeea04000,0xeea10000)
    to space 64K, 0% used [0xee9f0000,0xee9f0000,0xeea00000)
    PSOldGen total 113472K, used 1907K [0xb4e00000, 0xbbcd0000,
    0xedc40000)
    object space 113472K, 1% used [0xb4e00000,0xb4fdcd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000,
    0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:30:39
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user"
    daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e9b810> (a sun.nio.ch.Util$1)
    - locked <0xb4e9b800> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e9b5f8> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
    at
    org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at
    org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:277)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    - locked <0xb4e350c8> (a java.io.BufferedInputStream)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:501)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on
    condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at
    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at
    org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable
    [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable
    [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on
    condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on
    condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on
    condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait()
    [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in
    Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait()
    [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xee61dfe8> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at
    org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on
    condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 10281K [0xedc40000, 0xeea10000,
    0xf4e00000)
    eden space 14016K, 73% used [0xedc40000,0xee6464f0,0xee9f0000)
    from space 64K, 25% used [0xee9f0000,0xee9f4000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2315K [0xb4e00000, 0xbbcd0000,
    0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb5042d00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000,
    0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)

    2009-07-06 18:31:13
    Full thread dump Java HotSpot(TM) Server VM (11.3-b02 mixed mode):

    "IPC Client (47) connection to /10.10.0.163:60020 from an unknown user"
    daemon prio=10 tid=0xafa1d000 nid=0xd5c runnable [0xaf8ac000..0xaf8ad0b0]
    java.lang.Thread.State: RUNNABLE
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at
    org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
    at
    org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:80)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:513)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:445)

    "main-EventThread" daemon prio=10 tid=0x085aec00 nid=0xd59 waiting on
    condition [0xaf9ad000..0xaf9ade30]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0xb4e00230> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at
    java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at
    org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:376)

    "main-SendThread" daemon prio=10 tid=0x08533800 nid=0xd58 runnable
    [0xaf9fe000..0xaf9feeb0]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
    - locked <0xb4e01130> (a sun.nio.ch.Util$1)
    - locked <0xb4e01120> (a java.util.Collections$UnmodifiableSet)
    - locked <0xb4e010e0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:873)

    "Low Memory Detector" daemon prio=10 tid=0x08163800 nid=0xd56 runnable
    [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x08161800 nid=0xd55 waiting on
    condition [0x00000000..0xafe444e8]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x0815d400 nid=0xd54 waiting on
    condition [0x00000000..0xafec5568]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x0815b800 nid=0xd53 waiting on
    condition [0x00000000..0x00000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x08148400 nid=0xd52 in Object.wait()
    [0xb0167000..0xb0167fb0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0xb4e030f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x08146c00 nid=0xd51 in
    Object.wait() [0xb01b8000..0xb01b8e30]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0xb4e011a8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0xb4e011a8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x08059c00 nid=0xd47 in Object.wait()
    [0xf7fc0000..0xf7fc1278]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
    - locked <0xedd8dec0> (a
    org.apache.hadoop.hbase.ipc.HBaseClient$Call)
    at
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    at $Proxy0.incrementColumnValue(Unknown Source)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:504)
    at org.apache.hadoop.hbase.client.HTable$6.call(HTable.java:500)
    at
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:922)
    at
    org.apache.hadoop.hbase.client.HTable.incrementColumnValue(HTable.java:499)
    at com.qwapi.txnload.LoadDirect.loadRow(LoadDirect.java:157)
    at com.qwapi.txnload.LoadDirect.loadFile(LoadDirect.java:95)
    at com.qwapi.txnload.LoadDirect.main(LoadDirect.java:182)

    "VM Thread" prio=10 tid=0x08143400 nid=0xd50 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x08060c00 nid=0xd48 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x08062000 nid=0xd49 runnable

    "GC task thread#2 (ParallelGC)" prio=10 tid=0x08063800 nid=0xd4a runnable

    "GC task thread#3 (ParallelGC)" prio=10 tid=0x08065000 nid=0xd4b runnable

    "GC task thread#4 (ParallelGC)" prio=10 tid=0x08066400 nid=0xd4c runnable

    "GC task thread#5 (ParallelGC)" prio=10 tid=0x08067c00 nid=0xd4d runnable

    "GC task thread#6 (ParallelGC)" prio=10 tid=0x08069000 nid=0xd4e runnable

    "GC task thread#7 (ParallelGC)" prio=10 tid=0x0806a800 nid=0xd4f runnable

    "VM Periodic Task Thread" prio=10 tid=0x08165400 nid=0xd57 waiting on
    condition

    JNI global references: 895

    Heap
    PSYoungGen total 14080K, used 1448K [0xedc40000, 0xeea10000,
    0xf4e00000)
    eden space 14016K, 10% used [0xedc40000,0xedda2018,0xee9f0000)
    from space 64K, 50% used [0xee9f0000,0xee9f8000,0xeea00000)
    to space 64K, 0% used [0xeea00000,0xeea00000,0xeea10000)
    PSOldGen total 113472K, used 2359K [0xb4e00000, 0xbbcd0000,
    0xedc40000)
    object space 113472K, 2% used [0xb4e00000,0xb504dd00,0xbbcd0000)
    PSPermGen total 16384K, used 6188K [0xb0e00000, 0xb1e00000,
    0xb4e00000)
    object space 16384K, 37% used [0xb0e00000,0xb140b230,0xb1e00000)
    HTable.incrementColumnValue >>>>>>>>>>>>>>>>>>>>>

    ----- Original Message -----
    From: "Irfan Mohammed" <irfan.ma@gmail.com>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 3:56:57 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Writing to hdfs directly took just 21 seconds. So I am suspecting that
    there is something that I am doing incorrectly in my hbase setup or my code.

    Thanks for the help.

    [2009-07-06 15:52:47,917] 09/07/06 15:52:22 INFO mapred.FileInputFormat:
    Total input paths to process : 10
    09/07/06 15:52:22 INFO mapred.JobClient: Running job: job_200907052205_0235
    09/07/06 15:52:23 INFO mapred.JobClient: map 0% reduce 0%
    09/07/06 15:52:37 INFO mapred.JobClient: map 7% reduce 0%
    09/07/06 15:52:43 INFO mapred.JobClient: map 100% reduce 0%
    09/07/06 15:52:47 INFO mapred.JobClient: Job complete:
    job_200907052205_0235
    09/07/06 15:52:47 INFO mapred.JobClient: Counters: 9
    09/07/06 15:52:47 INFO mapred.JobClient: Job Counters
    09/07/06 15:52:47 INFO mapred.JobClient: Rack-local map tasks=4
    09/07/06 15:52:47 INFO mapred.JobClient: Launched map tasks=10
    09/07/06 15:52:47 INFO mapred.JobClient: Data-local map tasks=6
    09/07/06 15:52:47 INFO mapred.JobClient: FileSystemCounters
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_READ=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=587539988
    09/07/06 15:52:47 INFO mapred.JobClient: Map-Reduce Framework
    09/07/06 15:52:47 INFO mapred.JobClient: Map input records=294786
    09/07/06 15:52:47 INFO mapred.JobClient: Spilled Records=0
    09/07/06 15:52:47 INFO mapred.JobClient: Map input bytes=57966580
    09/07/06 15:52:47 INFO mapred.JobClient: Map output records=1160144

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:36:35 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    Sorry, yeah, that'd be 4 tables. So, yeah, it would seem you only have one
    region in each table. Your cells are small so thats probably about right.

    So, an hbase client is contacting 4 different servers to do each update.
    And running with one table made no difference to overall time?

    St.Ack
    On Mon, Jul 6, 2009 at 11:24 AM, Irfan Mohammed wrote:

    Input is 1 file.

    These are 4 different tables "txn_m1", "txn_m2", "txn_m3", "txn_m4". To me,
    it looks like it is always doing 1 region per table and these tables are
    always on different regionservers. I never seen the same table on different
    regionservers. Does that sound right?

    ----- Original Message -----
    From: "stack" <stack@duboce.net>
    To: hbase-dev@hadoop.apache.org
    Sent: Monday, July 6, 2009 2:14:43 PM GMT -05:00 US/Canada Eastern
    Subject: Re: performance help

    On Mon, Jul 6, 2009 at 11:06 AM, Irfan Mohammed <irfan.ma@gmail.com>
    wrote:
    I am working on writing to HDFS files. Will update you by end of day today.
    There are always 10 concurrent mappers running. I keep setting the
    setNumMaps(5) and also the following properties in mapred-site.xml to 3 but
    still end up running 10 concurrent maps.

    Is your input ten files?

    There are 5 regionservers and the online regions are as follows :

    m1 : -ROOT-,,0
    m2 : txn_m1,,1245462904101
    m3 : txn_m4,,1245462942282
    m4 : txn_m2,,1245462890248
    m5 : .META.,,1
    txn_m3,,1245460727203

    So, that looks like 4 regions from table txn?

    So thats about 1 region per regionserver?

    I have setAutoFlush(false) and also writeToWal(false) with the same
    behaviour.
    If you did above and still takes 10 minutes, then that would seem to rule
    out hbase (batching should have big impact on uploads and then setting
    writeToWAL to false, should double throughput over whatever you were seeing
    previous).

    St.Ack

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 2, '09 at 8:22p
activeJul 7, '09 at 12:51p
posts23
users3
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase