Grokbase Groups HBase user July 2010
FAQ
I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row is 30 KB in size: 1 big column family with just one field..
Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

I ran experiments on the following cases:

1. The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
2. Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

I find that these numbers are much less than the ones I get when it's running with just a few nodes..

Can you guys help me with this problem?

Thank you
Vidhya

Search Discussions

  • Ryan Rawson at Jul 26, 2010 at 10:02 pm
    Hey,

    A few questions:

    - sharded scan, are you not using TableInputFormat?
    - 1 MB block size - what block size? You probably shouldnt set the
    HDFS block size to 1MB, it just causes more nn traffic.
    - Tests a year ago indicated that HFile block size really didnt
    improve speed when you went beyond 64k or so.
    - Run more maps/machine... one map task per disk probably?
    - Try setting the client cache to an in-between level, 2-6 perhaps.

    Let us know about those other questions and we can go from there.
    -ryan

    On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman
    wrote:
    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node..  Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1.  The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2.  Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Vidhyashankar Venkataraman at Jul 26, 2010 at 10:17 pm
    I did not use a TableInputFormat: I ran my own scans on specific ranges (just for more control from my side to tune the ranges and the ease with which I can run a hadoop streaming job)..

    1 MB for Hfile Block size.. Not the HDFS block size..
    I increased it since I didn't care too much for random read performance.. HDFS block size is the default value... (I have a related question then: does the Hfile block size influence only the size of the index and the efficiency of random reads? I don't see an effect on scans though)...

    I had previously run 5 tasks per machine and at 20 rows, but that resulted in scanner expiries (UnknownScannerexception) and DFS socket timeouts.. So that's why I reduced the number of tasks.. Let me decrease the number of rows and see..

    Just to make sure: the client uses zookeeper only for obtaining ROOT right whenever it performs scans, isnt it? So scans shouldn't face any master/zk bottlenecks when we scale up wrt number of nodes, am I right?

    Thank you
    Vidhya

    On 7/26/10 3:01 PM, "Ryan Rawson" wrote:

    Hey,

    A few questions:

    - sharded scan, are you not using TableInputFormat?
    - 1 MB block size - what block size? You probably shouldnt set the
    HDFS block size to 1MB, it just causes more nn traffic.
    - Tests a year ago indicated that HFile block size really didnt
    improve speed when you went beyond 64k or so.
    - Run more maps/machine... one map task per disk probably?
    - Try setting the client cache to an in-between level, 2-6 perhaps.

    Let us know about those other questions and we can go from there.
    -ryan

    On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman
    wrote:
    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1. The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2. Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Xavier Stevens at Jul 26, 2010 at 10:30 pm
    We have something that might interest you.

    http://socorro.googlecode.com/svn/trunk/analysis/src/java/org/apache/hadoop/hbase/mapreduce/

    We haven't fully tested everything yet, so don't blame us if something
    goes wrong. It's basically the exact same as TableInputFormat except it
    takes an array of Scans to be used for row-key ranges. It requires the
    caller to setup the Scan array since they should have the best knowledge
    about their row-key structure.

    Preliminary results for us reduced a 15 minute job to under 2 minutes.

    Cheers,


    -Xavier
    On 7/26/10 3:16 PM, Vidhyashankar Venkataraman wrote:
    I did not use a TableInputFormat: I ran my own scans on specific ranges (just for more control from my side to tune the ranges and the ease with which I can run a hadoop streaming job)..

    1 MB for Hfile Block size.. Not the HDFS block size..
    I increased it since I didn't care too much for random read performance.. HDFS block size is the default value... (I have a related question then: does the Hfile block size influence only the size of the index and the efficiency of random reads? I don't see an effect on scans though)...

    I had previously run 5 tasks per machine and at 20 rows, but that resulted in scanner expiries (UnknownScannerexception) and DFS socket timeouts.. So that's why I reduced the number of tasks.. Let me decrease the number of rows and see..

    Just to make sure: the client uses zookeeper only for obtaining ROOT right whenever it performs scans, isnt it? So scans shouldn't face any master/zk bottlenecks when we scale up wrt number of nodes, am I right?

    Thank you
    Vidhya

    On 7/26/10 3:01 PM, "Ryan Rawson" wrote:

    Hey,

    A few questions:

    - sharded scan, are you not using TableInputFormat?
    - 1 MB block size - what block size? You probably shouldnt set the
    HDFS block size to 1MB, it just causes more nn traffic.
    - Tests a year ago indicated that HFile block size really didnt
    improve speed when you went beyond 64k or so.
    - Run more maps/machine... one map task per disk probably?
    - Try setting the client cache to an in-between level, 2-6 perhaps.

    Let us know about those other questions and we can go from there.
    -ryan

    On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman
    wrote:
    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1. The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2. Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Ryan Rawson at Jul 26, 2010 at 10:38 pm
    Hey,

    That sounds interesting - maybe you could tell us about why your
    system is better performing? The default TableInputFormat is just
    creating N map tasks, one for each region, which are all roughly the
    same data-size.

    What do you do?
    -ryan
    On Mon, Jul 26, 2010 at 3:29 PM, Xavier Stevens wrote:
    We have something that might interest you.

    http://socorro.googlecode.com/svn/trunk/analysis/src/java/org/apache/hadoop/hbase/mapreduce/

    We haven't fully tested everything yet, so don't blame us if something
    goes wrong.  It's basically the exact same as TableInputFormat except it
    takes an array of Scans to be used for row-key ranges.  It requires the
    caller to setup the Scan array since they should have the best knowledge
    about their row-key structure.

    Preliminary results for us reduced a 15 minute job to under 2 minutes.

    Cheers,


    -Xavier
    On 7/26/10 3:16 PM, Vidhyashankar Venkataraman wrote:
    I did not use a TableInputFormat: I ran my own scans on specific ranges (just for more control from my side to tune the ranges and the ease with which I can run a hadoop streaming job)..

    1 MB for Hfile Block size.. Not the HDFS block size..
    I increased it since I didn't care too much for random read performance.. HDFS block size is the default value... (I have a related question then: does the Hfile block size influence only the size of the index and the efficiency of random reads?  I don't see an effect on scans though)...

    I had previously run 5 tasks per machine and at 20 rows, but that resulted in scanner expiries (UnknownScannerexception) and DFS socket timeouts.. So that's why I reduced the number of tasks.. Let me decrease the number of rows and see..

    Just to make sure: the client uses zookeeper only for obtaining ROOT right whenever it performs scans, isnt it? So scans shouldn't face any master/zk bottlenecks when we scale up wrt number of nodes, am I right?

    Thank you
    Vidhya

    On 7/26/10 3:01 PM, "Ryan Rawson" wrote:

    Hey,

    A few questions:

    - sharded scan, are you not using TableInputFormat?
    - 1 MB block size - what block size?  You probably shouldnt set the
    HDFS block size to 1MB, it just causes more nn traffic.
    - Tests a year ago indicated that HFile block size really didnt
    improve speed when you went beyond 64k or so.
    - Run more maps/machine... one map task per disk probably?
    - Try setting the client cache to an in-between level, 2-6 perhaps.

    Let us know about those other questions and we can go from there.
    -ryan

    On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman
    wrote:
    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node..  Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1.  The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2.  Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Xavier Stevens at Jul 26, 2010 at 10:56 pm
    It's better performing for us because of the way we structured our
    keys. We use a "salt" character plus a date at the front of the key.

    So "f20100701xxxxxxxxx" for instance. We use [0-f] as a salt so that we
    get keys more distributed across the cluster. This is because when we
    had just the date at the front of the key it would all go to one
    server. Then if we ran a MR job the RegionServer would fall over under
    the load.

    So the first implementation I did just used TableInputFormat with
    startRow="020100701" and stopRow="f20100701". The problem with this was
    we would have to scan over things like "a20100630" which we didn't
    really want. That's why we went the multiple scan range route.

    The code isn't any better and we could probably just subclass off of
    TableInputFormat at this point. It was just easier for me to step
    through the process having all of the code in one spot. The MultiScan
    just reduces the amount of data we read and works better for us with the
    way our keys are.


    -Xavier

    On 7/26/10 3:37 PM, Ryan Rawson wrote:
    Hey,

    That sounds interesting - maybe you could tell us about why your
    system is better performing? The default TableInputFormat is just
    creating N map tasks, one for each region, which are all roughly the
    same data-size.

    What do you do?
    -ryan
    On Mon, Jul 26, 2010 at 3:29 PM, Xavier Stevens wrote:
    We have something that might interest you.

    http://socorro.googlecode.com/svn/trunk/analysis/src/java/org/apache/hadoop/hbase/mapreduce/

    We haven't fully tested everything yet, so don't blame us if something
    goes wrong. It's basically the exact same as TableInputFormat except it
    takes an array of Scans to be used for row-key ranges. It requires the
    caller to setup the Scan array since they should have the best knowledge
    about their row-key structure.

    Preliminary results for us reduced a 15 minute job to under 2 minutes.

    Cheers,


    -Xavier
    On 7/26/10 3:16 PM, Vidhyashankar Venkataraman wrote:
    I did not use a TableInputFormat: I ran my own scans on specific ranges (just for more control from my side to tune the ranges and the ease with which I can run a hadoop streaming job)..

    1 MB for Hfile Block size.. Not the HDFS block size..
    I increased it since I didn't care too much for random read performance.. HDFS block size is the default value... (I have a related question then: does the Hfile block size influence only the size of the index and the efficiency of random reads? I don't see an effect on scans though)...

    I had previously run 5 tasks per machine and at 20 rows, but that resulted in scanner expiries (UnknownScannerexception) and DFS socket timeouts.. So that's why I reduced the number of tasks.. Let me decrease the number of rows and see..

    Just to make sure: the client uses zookeeper only for obtaining ROOT right whenever it performs scans, isnt it? So scans shouldn't face any master/zk bottlenecks when we scale up wrt number of nodes, am I right?

    Thank you
    Vidhya

    On 7/26/10 3:01 PM, "Ryan Rawson" wrote:

    Hey,

    A few questions:

    - sharded scan, are you not using TableInputFormat?
    - 1 MB block size - what block size? You probably shouldnt set the
    HDFS block size to 1MB, it just causes more nn traffic.
    - Tests a year ago indicated that HFile block size really didnt
    improve speed when you went beyond 64k or so.
    - Run more maps/machine... one map task per disk probably?
    - Try setting the client cache to an in-between level, 2-6 perhaps.

    Let us know about those other questions and we can go from there.
    -ryan

    On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman
    wrote:
    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1. The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2. Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Vidhyashankar Venkataraman at Jul 26, 2010 at 10:02 pm
    Oh and forgot to add, 4 gig regions and 8 gig heap size..


    On 7/26/10 2:43 PM, "Vidhyashankar Venkataraman" wrote:

    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1. The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2. Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Stack at Jul 27, 2010 at 6:23 am

    On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman wrote:
    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...
    How big is the range V? How many rows you scan in your map task?
    They are contiguous, right?

    1 MB block size, Block cache enabled.. Max of 2 tasks per node..  Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    Did you try with defaults first?

    I ran experiments on the following cases:

    1.  The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2.  Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    What numbers you see on a smaller cluster?
    Oh and forgot to add, 4 gig regions and 8 gig heap size..
    So 4G to HBase and 8G on these machines in total? You are running
    TaskTrackers on same machines? 2 Mappers, 1 DN, and 1RS on all 180
    machines?

    You are using Hadoop streaming? Hows that work? Streaming does text
    only? I didn't think you could write HBase out of Streaming.
    does the Hfile block size influence only the size of the index and the efficiency of random reads?
    Generally, yes. I'd think though that a bigger block size, especially
    if you are using caching so you cut down on number of RPCs, then you
    should be approaching the scan speeds you'd see going against HDFS.
    Just to make sure: the client uses zookeeper only for obtaining ROOT right whenever it performs scans, isnt it? So scans shouldn't face any master/zk bottlenecks when we scale up wrt number of nodes, am I right?
    Thats right.

    St.Ack
  • Vidhyashankar Venkataraman at Jul 28, 2010 at 5:31 pm
    I reran my Scans on a 180 node cluster (with 160 RS's/data nodes/task trackers) using TableInputFormat with block cache disabled and I got the following results/observations:

    1. 90% of the job completed in roughly 5 and a half hours which comes to 29 MBps per node (counting 160 RS's).. Is this reasonable? Can I get better results?

    2. The last 10% of the job has still been running for more than 3 hours.. I am trying to find some common thread of reasoning why this is happening.

    3. I tried disabling the block cache in the MR job (I was using TableMapReduceUtil: see code snippet below) and through hbase shell just to change default options (disable the data and change using alter options)..

    conf.set(TableInputFormat.SCAN_CACHEBLOCKS, "false");
    Job job = new Job(conf, "Scan" + "_" + tableName);
    ..........
    Scan scan = new Scan();
    scan.addColumns("bigColumn");
    scan.setCaching(10);
    scan.setCacheBlocks(false); // Setting it again.. I guess this will override the previous setting..
    // Second argument is the table name.
    TableMapReduceUtil.initTableMapperJob(tableName, scan,
    MapperHB.class, Text.class, NullWritable.class, job);

    But I still get LruBlockcache stats like the following ones: Is this okay?

    2010-07-27 14:13:40,852 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1304510, Hit=1914, Miss=1302596, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1467217574827373%, Miss Ratio=99.8532772064209%, Evicted/Run=238.943939208984382010-07-27 14:14:40,860 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1306103, Hit=1914, Miss=1304189, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14654280385002494%, Miss Ratio=99.85345602035522%, Evicted/Run=238.943939208984382010-07-27 14:15:40,853 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1307783, Hit=1914, Miss=1305869, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14635456027463078%, Miss Ratio=99.8536467552185%, Evicted/Run=238.94393920898438
    ................
    .................
    2010-07-27 22:45:40,853 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:46:40,854 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:47:40,853 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:48:40,853 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:49:40,876 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348659, Hit=1914, Miss=1346745, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14191875234246254%, Miss Ratio=99.8580813407898%, Evicted/Run=238.94393920898438
    2010-07-27 22:50:41,140 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349089, Hit=1914, Miss=1347175, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1418735133484006%, Miss Ratio=99.85812902450562%, Evicted/Run=238.94393920898438


    4. A minor number of region servers (8 out of 150) crash (more of a graceful shutdown) while performing the scan and I saw these messages just before the shutdown started:

    2010-07-27 22:52:57,448 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x2a0190c221002c to sun.nio.ch.SelectionKeyImpl@286e4365
    java.io.IOException: TIMED OUT
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
    2010-07-27 22:52:57,454 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914, Miss=1347572, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,454 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914, Miss=1347572, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-30527588457879943, 30) from 98.136.191.88:33914: output error
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-7358804533352185527, 30) from 98.136.191.88:33963: output error
    2010-07-27 22:52:57,602 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-3402818986992405350, 30) from 98.136.191.88:33927: output error
    2010-07-27 22:52:57,603 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-6635348671997204137, 30) from 98.136.191.88:33916: output error
    2010-07-27 22:52:57,603 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null
    2010-07-27 22:52:57,603 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 21 on 60020 caught: java.nio.channels.ClosedChannelException
    ....................
    .....................
    2010-07-27 22:52:58,059 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
    org.apache.hadoop.hbase.Leases$LeaseStillHeldException

    Thanks in advance,
    Vidhya

    On 7/26/10 2:43 PM, "Vidhyashankar Venkataraman" wrote:

    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1. The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2. Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Ted Yu at Jul 28, 2010 at 5:51 pm
    For the region servers that went down, do you find the following in their
    logs ?

    2010-06-17 22:22:47,414 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
    55448ms, ten times longer than scheduled: 3000
    On Wed, Jul 28, 2010 at 10:30 AM, Vidhyashankar Venkataraman wrote:

    I reran my Scans on a 180 node cluster (with 160 RS's/data nodes/task
    trackers) using TableInputFormat with block cache disabled and I got the
    following results/observations:

    1. 90% of the job completed in roughly 5 and a half hours which comes to
    29 MBps per node (counting 160 RS's).. Is this reasonable? Can I get better
    results?

    2. The last 10% of the job has still been running for more than 3
    hours.. I am trying to find some common thread of reasoning why this is
    happening.

    3. I tried disabling the block cache in the MR job (I was using
    TableMapReduceUtil: see code snippet below) and through hbase shell just to
    change default options (disable the data and change using alter options)..

    conf.set(TableInputFormat.SCAN_CACHEBLOCKS, "false");
    Job job = new Job(conf, "Scan" + "_" + tableName);
    ..........
    Scan scan = new Scan();
    scan.addColumns("bigColumn");
    scan.setCaching(10);
    scan.setCacheBlocks(false); // Setting it again.. I
    guess this will override the previous setting..
    // Second argument is the table name.
    TableMapReduceUtil.initTableMapperJob(tableName, scan,
    MapperHB.class, Text.class, NullWritable.class, job);

    But I still get LruBlockcache stats like the following ones: Is this okay?

    2010-07-27 14:13:40,852 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1304510, Hit=1914, Miss=1302596,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1467217574827373%, Miss
    Ratio=99.8532772064209%, Evicted/Run=238.943939208984382010-07-27
    14:14:40,860 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache
    Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1306103, Hit=1914,
    Miss=1304189, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14654280385002494%, Miss Ratio=99.85345602035522%,
    Evicted/Run=238.943939208984382010-07-27 14:15:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1307783, Hit=1914, Miss=1305869,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14635456027463078%, Miss
    Ratio=99.8536467552185%, Evicted/Run=238.94393920898438
    ................
    .................
    2010-07-27 22:45:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:46:40,854 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:47:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:48:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:49:40,876 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348659, Hit=1914, Miss=1346745,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14191875234246254%, Miss
    Ratio=99.8580813407898%, Evicted/Run=238.94393920898438
    2010-07-27 22:50:41,140 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349089, Hit=1914, Miss=1347175,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1418735133484006%, Miss
    Ratio=99.85812902450562%, Evicted/Run=238.94393920898438


    4. A minor number of region servers (8 out of 150) crash (more of a
    graceful shutdown) while performing the scan and I saw these messages just
    before the shutdown started:

    2010-07-27 22:52:57,448 WARN org.apache.zookeeper.ClientCnxn: Exception
    closing session 0x2a0190c221002c to sun.nio.ch.SelectionKeyImpl@286e4365
    java.io.IOException: TIMED OUT
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
    2010-07-27 22:52:57,454 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914, Miss=1347572,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss
    Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,454 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914, Miss=1347572,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss
    Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-30527588457879943, 30) from 98.136.191.88:33914:
    output error
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-7358804533352185527, 30) from 98.136.191.88:33963:
    output error
    2010-07-27 22:52:57,602 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-3402818986992405350, 30) from 98.136.191.88:33927:
    output error
    2010-07-27 22:52:57,603 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-6635348671997204137, 30) from 98.136.191.88:33916:
    output error
    2010-07-27 22:52:57,603 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
    state: Disconnected, type: None, path: null
    2010-07-27 22:52:57,603 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
    handler 21 on 60020 caught: java.nio.channels.ClosedChannelException
    ....................
    .....................
    2010-07-27 22:52:58,059 WARN
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
    org.apache.hadoop.hbase.Leases$LeaseStillHeldException

    Thanks in advance,
    Vidhya

    On 7/26/10 2:43 PM, "Vidhyashankar Venkataraman" wrote:

    I am trying to assess the performance of Scans on a 100TB db on 180 nodes
    running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range:
    speculative execution is turned false so that there is no duplication in
    tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row
    is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout
    exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1. The client level cache is set to 1 (default: got he number using
    getCaching): The MR tasks take around 13 hours to finish in the average..
    Which gives around 13.17 MBps per node. The worst case is 34 hours (to
    finish the entire job)...
    2. Client cache set to 20 rows: this is much worse than the previous
    case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a
    multiple of the above said cache size? Or the cache size to a much lower
    value?

    I find that these numbers are much less than the ones I get when it's
    running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Vidhyashankar Venkataraman at Jul 28, 2010 at 6:04 pm
    Yeah I got it in some:
    2010-07-27 22:53:31,703 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x32a0190c21e0030 to sun.nio.ch.SelectionKeyImpl@7971f189
    java.io.IOException: TIMED OUT
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
    2010-07-27 22:53:31,937 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1847.005MB (1936725152), Free=552.8263MB (579680352), Max=2399.8313MB (2516405504), Counts: Blocks=1811, Access=1332343, Hit=1811, Miss=1330532, Evictions=3177, Evicted=759144, Ratios: Hit Ratio=0.13592595933005214%, Miss Ratio=99.86407160758972%, Evicted/Run=238.949951171875
    2010-07-27 22:53:31,936 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-8452685834063022439, 30) from 98.136.191.84:57851: output error
    2010-07-27 22:53:31,827 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-5892917023728878793, 30) from 98.136.191.84:57817: output error
    2010-07-27 22:53:31,998 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-299811806597554323, 30) from 98.136.191.84:57840: output error
    2010-07-27 22:53:31,822 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-2595212217395389635, 30) from 98.136.191.84:57829: output error
    2010-07-27 22:53:31,718 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 141975ms, ten times longer than scheduled: 10000
    2010-07-27 22:53:32,144 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 39 on 60020 caught: java.nio.channels.ClosedChannelException

    Vidhya


    On 7/28/10 10:51 AM, "Ted Yu" wrote:

    For the region servers that went down, do you find the following in their
    logs ?

    2010-06-17 22:22:47,414 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
    55448ms, ten times longer than scheduled: 3000
    On Wed, Jul 28, 2010 at 10:30 AM, Vidhyashankar Venkataraman wrote:

    I reran my Scans on a 180 node cluster (with 160 RS's/data nodes/task
    trackers) using TableInputFormat with block cache disabled and I got the
    following results/observations:

    1. 90% of the job completed in roughly 5 and a half hours which comes to
    29 MBps per node (counting 160 RS's).. Is this reasonable? Can I get better
    results?

    2. The last 10% of the job has still been running for more than 3
    hours.. I am trying to find some common thread of reasoning why this is
    happening.

    3. I tried disabling the block cache in the MR job (I was using
    TableMapReduceUtil: see code snippet below) and through hbase shell just to
    change default options (disable the data and change using alter options)..

    conf.set(TableInputFormat.SCAN_CACHEBLOCKS, "false");
    Job job = new Job(conf, "Scan" + "_" + tableName);
    ..........
    Scan scan = new Scan();
    scan.addColumns("bigColumn");
    scan.setCaching(10);
    scan.setCacheBlocks(false); // Setting it again.. I
    guess this will override the previous setting..
    // Second argument is the table name.
    TableMapReduceUtil.initTableMapperJob(tableName, scan,
    MapperHB.class, Text.class, NullWritable.class, job);

    But I still get LruBlockcache stats like the following ones: Is this okay?

    2010-07-27 14:13:40,852 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1304510, Hit=1914, Miss=1302596,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1467217574827373%, Miss
    Ratio=99.8532772064209%, Evicted/Run=238.943939208984382010-07-27
    14:14:40,860 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache
    Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1306103, Hit=1914,
    Miss=1304189, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14654280385002494%, Miss Ratio=99.85345602035522%,
    Evicted/Run=238.943939208984382010-07-27 14:15:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1307783, Hit=1914, Miss=1305869,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14635456027463078%, Miss
    Ratio=99.8536467552185%, Evicted/Run=238.94393920898438
    ................
    .................
    2010-07-27 22:45:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:46:40,854 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:47:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:48:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:49:40,876 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348659, Hit=1914, Miss=1346745,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14191875234246254%, Miss
    Ratio=99.8580813407898%, Evicted/Run=238.94393920898438
    2010-07-27 22:50:41,140 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349089, Hit=1914, Miss=1347175,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1418735133484006%, Miss
    Ratio=99.85812902450562%, Evicted/Run=238.94393920898438


    4. A minor number of region servers (8 out of 150) crash (more of a
    graceful shutdown) while performing the scan and I saw these messages just
    before the shutdown started:

    2010-07-27 22:52:57,448 WARN org.apache.zookeeper.ClientCnxn: Exception
    closing session 0x2a0190c221002c to sun.nio.ch.SelectionKeyImpl@286e4365
    java.io.IOException: TIMED OUT
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
    2010-07-27 22:52:57,454 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914, Miss=1347572,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss
    Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,454 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914, Miss=1347572,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss
    Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-30527588457879943, 30) from 98.136.191.88:33914:
    output error
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-7358804533352185527, 30) from 98.136.191.88:33963:
    output error
    2010-07-27 22:52:57,602 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-3402818986992405350, 30) from 98.136.191.88:33927:
    output error
    2010-07-27 22:52:57,603 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-6635348671997204137, 30) from 98.136.191.88:33916:
    output error
    2010-07-27 22:52:57,603 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
    state: Disconnected, type: None, path: null
    2010-07-27 22:52:57,603 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
    handler 21 on 60020 caught: java.nio.channels.ClosedChannelException
    ....................
    .....................
    2010-07-27 22:52:58,059 WARN
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
    org.apache.hadoop.hbase.Leases$LeaseStillHeldException

    Thanks in advance,
    Vidhya

    On 7/26/10 2:43 PM, "Vidhyashankar Venkataraman" wrote:

    I am trying to assess the performance of Scans on a 100TB db on 180 nodes
    running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range:
    speculative execution is turned false so that there is no duplication in
    tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row
    is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout
    exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1. The client level cache is set to 1 (default: got he number using
    getCaching): The MR tasks take around 13 hours to finish in the average..
    Which gives around 13.17 MBps per node. The worst case is 34 hours (to
    finish the entire job)...
    2. Client cache set to 20 rows: this is much worse than the previous
    case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a
    multiple of the above said cache size? Or the cache size to a much lower
    value?

    I find that these numbers are much less than the ones I get when it's
    running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Ted Yu at Jul 28, 2010 at 6:44 pm
    If task tracker runs on such nodes, please reduce the maximum number of
    mappers / reducers.
    On Wed, Jul 28, 2010 at 11:03 AM, Vidhyashankar Venkataraman wrote:

    Yeah I got it in some:
    2010-07-27 22:53:31,703 WARN org.apache.zookeeper.ClientCnxn: Exception
    closing session 0x32a0190c21e0030 to sun.nio.ch.SelectionKeyImpl@7971f189
    java.io.IOException: TIMED OUT
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
    2010-07-27 22:53:31,937 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1847.005MB (1936725152), Free=552.8263MB (579680352), Max=2399.8313MB
    (2516405504), Counts: Blocks=1811, Access=1332343, Hit=1811, Miss=1330532,
    Evictions=3177, Evicted=759144, Ratios: Hit Ratio=0.13592595933005214%, Miss
    Ratio=99.86407160758972%, Evicted/Run=238.949951171875
    2010-07-27 22:53:31,936 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-8452685834063022439, 30) from 98.136.191.84:57851:
    output error
    2010-07-27 22:53:31,827 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-5892917023728878793, 30) from 98.136.191.84:57817:
    output error
    2010-07-27 22:53:31,998 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-299811806597554323, 30) from 98.136.191.84:57840:
    output error
    2010-07-27 22:53:31,822 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-2595212217395389635, 30) from 98.136.191.84:57829:
    output error
    2010-07-27 22:53:31,718 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
    141975ms, ten times longer than scheduled: 10000
    2010-07-27 22:53:32,144 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
    handler 39 on 60020 caught: java.nio.channels.ClosedChannelException

    Vidhya


    On 7/28/10 10:51 AM, "Ted Yu" wrote:

    For the region servers that went down, do you find the following in their
    logs ?

    2010-06-17 22:22:47,414 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
    55448ms, ten times longer than scheduled: 3000

    On Wed, Jul 28, 2010 at 10:30 AM, Vidhyashankar Venkataraman <
    vidhyash@yahoo-inc.com> wrote:
    I reran my Scans on a 180 node cluster (with 160 RS's/data nodes/task
    trackers) using TableInputFormat with block cache disabled and I got the
    following results/observations:

    1. 90% of the job completed in roughly 5 and a half hours which comes to
    29 MBps per node (counting 160 RS's).. Is this reasonable? Can I get better
    results?

    2. The last 10% of the job has still been running for more than 3
    hours.. I am trying to find some common thread of reasoning why this is
    happening.

    3. I tried disabling the block cache in the MR job (I was using
    TableMapReduceUtil: see code snippet below) and through hbase shell just to
    change default options (disable the data and change using alter
    options)..
    conf.set(TableInputFormat.SCAN_CACHEBLOCKS, "false");
    Job job = new Job(conf, "Scan" + "_" + tableName);
    ..........
    Scan scan = new Scan();
    scan.addColumns("bigColumn");
    scan.setCaching(10);
    scan.setCacheBlocks(false); // Setting it again.. I
    guess this will override the previous setting..
    // Second argument is the table name.
    TableMapReduceUtil.initTableMapperJob(tableName, scan,
    MapperHB.class, Text.class, NullWritable.class, job);

    But I still get LruBlockcache stats like the following ones: Is this okay?
    2010-07-27 14:13:40,852 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1304510, Hit=1914,
    Miss=1302596,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1467217574827373%, Miss
    Ratio=99.8532772064209%, Evicted/Run=238.943939208984382010-07-27
    14:14:40,860 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache
    Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB
    (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1306103, Hit=1914,
    Miss=1304189, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14654280385002494%, Miss Ratio=99.85345602035522%,
    Evicted/Run=238.943939208984382010-07-27 14:15:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1307783, Hit=1914,
    Miss=1305869,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14635456027463078%, Miss
    Ratio=99.8536467552185%, Evicted/Run=238.94393920898438
    ................
    .................
    2010-07-27 22:45:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914,
    Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:46:40,854 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914,
    Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:47:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914,
    Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:48:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914,
    Miss=1346513,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss
    Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:49:40,876 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1348659, Hit=1914,
    Miss=1346745,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14191875234246254%, Miss
    Ratio=99.8580813407898%, Evicted/Run=238.94393920898438
    2010-07-27 22:50:41,140 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349089, Hit=1914,
    Miss=1347175,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1418735133484006%, Miss
    Ratio=99.85812902450562%, Evicted/Run=238.94393920898438


    4. A minor number of region servers (8 out of 150) crash (more of a
    graceful shutdown) while performing the scan and I saw these messages just
    before the shutdown started:

    2010-07-27 22:52:57,448 WARN org.apache.zookeeper.ClientCnxn: Exception
    closing session 0x2a0190c221002c to sun.nio.ch.SelectionKeyImpl@286e4365
    java.io.IOException: TIMED OUT
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
    2010-07-27 22:52:57,454 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914,
    Miss=1347572,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss
    Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,454 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB
    (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914,
    Miss=1347572,
    Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss
    Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-30527588457879943, 30) from 98.136.191.88:33914:
    output error
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-7358804533352185527, 30) from 98.136.191.88:33963:
    output error
    2010-07-27 22:52:57,602 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-3402818986992405350, 30) from 98.136.191.88:33927:
    output error
    2010-07-27 22:52:57,603 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
    Responder, call next(-6635348671997204137, 30) from 98.136.191.88:33916:
    output error
    2010-07-27 22:52:57,603 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
    state: Disconnected, type: None, path: null
    2010-07-27 22:52:57,603 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
    handler 21 on 60020 caught: java.nio.channels.ClosedChannelException
    ....................
    .....................
    2010-07-27 22:52:58,059 WARN
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
    org.apache.hadoop.hbase.Leases$LeaseStillHeldException

    Thanks in advance,
    Vidhya

    On 7/26/10 2:43 PM, "Vidhyashankar Venkataraman" <vidhyash@yahoo-inc.com

    wrote:

    I am trying to assess the performance of Scans on a 100TB db on 180 nodes
    running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range:
    speculative execution is turned false so that there is no duplication in
    tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node.. Each row
    is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout
    exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1. The client level cache is set to 1 (default: got he number using
    getCaching): The MR tasks take around 13 hours to finish in the average..
    Which gives around 13.17 MBps per node. The worst case is 34 hours (to
    finish the entire job)...
    2. Client cache set to 20 rows: this is much worse than the previous
    case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a
    multiple of the above said cache size? Or the cache size to a much lower
    value?

    I find that these numbers are much less than the ones I get when it's
    running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Stack at Jul 28, 2010 at 6:46 pm

    On Wed, Jul 28, 2010 at 10:30 AM, Vidhyashankar Venkataraman wrote:
    I reran my Scans on a 180 node cluster (with 160 RS's/data nodes/task trackers) using TableInputFormat with block cache disabled and I got the following results/observations:

    1.  90% of the job completed in roughly 5 and a half hours which comes to 29 MBps per node (counting 160 RS's).. Is this reasonable? Can I get better results?

    29MB/s doesn't sound too bad.

    How many disks? What can you get from the system measuring disk
    throughput directly?


    2.  The last 10% of the job has still been running for more than 3 hours.. I am trying to find some common thread of reasoning why this is happening.
    So, whats happening? Is this reduce? In the reduce, are you loading
    hbase? Is it loading hbase that is taking the time? Are tasks
    failing? If so, which ones? Can you figure which part of the table
    they are hitting when they fail?

    3. I tried disabling the block cache in the MR job (I was using TableMapReduceUtil: see code snippet below) and through hbase shell just to change default options (disable the data and change using alter options)..

    conf.set(TableInputFormat.SCAN_CACHEBLOCKS, "false");
    Job job = new Job(conf, "Scan" + "_" + tableName);
    ..........
    Scan scan = new Scan();
    scan.addColumns("bigColumn");
    scan.setCaching(10);
    scan.setCacheBlocks(false);                      // Setting it again.. I guess this will override the previous setting..
    // Second argument is the table name.
    TableMapReduceUtil.initTableMapperJob(tableName, scan,
    MapperHB.class, Text.class, NullWritable.class, job);

    But I still get LruBlockcache stats like the following ones: Is this okay?

    2010-07-27 14:13:40,852 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1304510, Hit=1914, Miss=1302596, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1467217574827373%, Miss Ratio=99.8532772064209%, Evicted/Run=238.943939208984382010-07-27 14:14:40,860 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1306103, Hit=1914, Miss=1304189, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14654280385002494%, Miss Ratio=99.85345602035522%, Evicted/Run=238.943939208984382010-07-27 14:15:40,853 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1307783, Hit=1914, Miss=1305869, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14635456027463078%, Miss Ratio=99.8536467552185%, Evicted/Run=238.94393920898438
    ................
    .................

    Block cache is global to the regionserver. Is any other table being
    accessed out of this job (e.g. .META. or -ROOT- tables?)?

    The above shows that LRU has been hit about 2k times over the course
    of a minute. If above is from host carrying .META., thats probably
    OK. If its from some other host, and no other table is being
    accessed, your efforts at disabling block caching do not seem to be
    having an effect.

    2010-07-27 22:45:40,853 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:46:40,854 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:47:40,853 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:48:40,853 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427, Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%, Evicted/Run=238.94393920898438
    2010-07-27 22:49:40,876 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348659, Hit=1914, Miss=1346745, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14191875234246254%, Miss Ratio=99.8580813407898%, Evicted/Run=238.94393920898438
    2010-07-27 22:50:41,140 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349089, Hit=1914, Miss=1347175, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.1418735133484006%, Miss Ratio=99.85812902450562%, Evicted/Run=238.94393920898438
    Hmm... on the other hand, not much going on in the block cache -- not
    enough to worry about... numbers are staying pretty constant over
    time.

    4.   A minor number of region servers (8 out of 150) crash (more of a graceful shutdown) while performing the scan and I saw these messages just before the shutdown started:

    2010-07-27 22:52:57,448 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x2a0190c221002c to sun.nio.ch.SelectionKeyImpl@286e4365
    java.io.IOException: TIMED OUT
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
    2010-07-27 22:52:57,454 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914, Miss=1347572, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,454 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=1950.0773MB (2044804168), Free=449.7541MB (471601336), Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349486, Hit=1914, Miss=1347572, Evictions=3300, Evicted=788515, Ratios: Hit Ratio=0.14183177845552564%, Miss Ratio=99.85817074775696%, Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-30527588457879943, 30) from 98.136.191.88:33914: output error
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-7358804533352185527, 30) from 98.136.191.88:33963: output error
    2010-07-27 22:52:57,602 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-3402818986992405350, 30) from 98.136.191.88:33927: output error
    2010-07-27 22:52:57,603 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call next(-6635348671997204137, 30) from 98.136.191.88:33916: output error
    2010-07-27 22:52:57,603 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null
    2010-07-27 22:52:57,603 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 21 on 60020 caught: java.nio.channels.ClosedChannelException
    ....................
    .....................
    2010-07-27 22:52:58,059 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
    org.apache.hadoop.hbase.Leases$LeaseStillHeldException

    You should up your zk session timeout given the type of job you are
    running. Up the session timeout in hbase-site.xml. See:

    <property>
    <name>zookeeper.session.timeout</name>
    <value>60000</value>
    <description>ZooKeeper session timeout.
    HBase passes this to the zk quorum as suggested maximum time for a
    session. See
    http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
    "The client sends a requested timeout, the server responds with the
    timeout that it can give the client. "
    In milliseconds.
    </description>
    </property>

    Read that URL. Says "The current implementation requires that the
    timeout be a minimum of 2 times the tickTime (as set in the server
    configuration) and a maximum of 20 times the tickTime. " so you need
    to up hbase.zookeeper.property.tickTime too... from its current
    setting of '3'. Set it to 6 or 9 even.

    St.Ack

    Thanks in advance,
    Vidhya

    On 7/26/10 2:43 PM, "Vidhyashankar Venkataraman" wrote:

    I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

    I run a sharded scan (each Map task runs a scan on a specific range: speculative execution is turned false so that there is no duplication in tasks) on a fully compacted table...

    1 MB block size, Block cache enabled.. Max of 2 tasks per node..  Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so I have not reassigned the write socket timeout...

    I ran experiments on the following cases:

    1.  The client level cache is set to 1 (default: got he number using getCaching): The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node. The worst case is 34 hours (to finish the entire job)...
    2.  Client cache set to 20 rows: this is much worse than the previous case: we get around a super low 1MBps per node...

    Question: Should I set it to a value such that the block size is a multiple of the above said cache size? Or the cache size to a much lower value?

    I find that these numbers are much less than the ones I get when it's running with just a few nodes..

    Can you guys help me with this problem?

    Thank you
    Vidhya
  • Jonathan Gray at Jul 28, 2010 at 6:51 pm
    Re: block cache, I believe that even if you disable the block cache for the scan, you will still check the block cache to see if blocks have already been loaded. The flag is to prevent storing newly fetched blocks from being put into the cache.
    -----Original Message-----
    From: saint.ack@gmail.com On Behalf Of
    Stack
    Sent: Wednesday, July 28, 2010 11:46 AM
    To: user@hbase.apache.org
    Subject: Re: More on Large scale Scans...

    On Wed, Jul 28, 2010 at 10:30 AM, Vidhyashankar Venkataraman
    wrote:
    I reran my Scans on a 180 node cluster (with 160 RS's/data nodes/task
    trackers) using TableInputFormat with block cache disabled and I got
    the following results/observations:
    1.  90% of the job completed in roughly 5 and a half hours which
    comes to 29 MBps per node (counting 160 RS's).. Is this reasonable? Can
    I get better results?

    29MB/s doesn't sound too bad.

    How many disks? What can you get from the system measuring disk
    throughput directly?


    2.  The last 10% of the job has still been running for more than 3
    hours.. I am trying to find some common thread of reasoning why this is
    happening.
    So, whats happening? Is this reduce? In the reduce, are you loading
    hbase? Is it loading hbase that is taking the time? Are tasks
    failing? If so, which ones? Can you figure which part of the table
    they are hitting when they fail?

    3. I tried disabling the block cache in the MR job (I was using
    TableMapReduceUtil: see code snippet below) and through hbase shell
    just to change default options (disable the data and change using alter
    options)..
    conf.set(TableInputFormat.SCAN_CACHEBLOCKS, "false");
    Job job = new Job(conf, "Scan" + "_" + tableName);
    ..........
    Scan scan = new Scan();
    scan.addColumns("bigColumn");
    scan.setCaching(10);
    scan.setCacheBlocks(false);                      // Setting it
    again.. I guess this will override the previous setting..
    // Second argument is the table name.
    TableMapReduceUtil.initTableMapperJob(tableName, scan,
    MapperHB.class, Text.class, NullWritable.class, job);

    But I still get LruBlockcache stats like the following ones: Is this okay?
    2010-07-27 14:13:40,852 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1304510,
    Hit=1914, Miss=1302596, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.1467217574827373%, Miss Ratio=99.8532772064209%,
    Evicted/Run=238.943939208984382010-07-27 14:14:40,860 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1306103,
    Hit=1914, Miss=1304189, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14654280385002494%, Miss Ratio=99.85345602035522%,
    Evicted/Run=238.943939208984382010-07-27 14:15:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1307783,
    Hit=1914, Miss=1305869, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14635456027463078%, Miss Ratio=99.8536467552185%,
    Evicted/Run=238.94393920898438
    ................
    .................

    Block cache is global to the regionserver. Is any other table being
    accessed out of this job (e.g. .META. or -ROOT- tables?)?

    The above shows that LRU has been hit about 2k times over the course
    of a minute. If above is from host carrying .META., thats probably
    OK. If its from some other host, and no other table is being
    accessed, your efforts at disabling block caching do not seem to be
    having an effect.

    2010-07-27 22:45:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427,
    Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%,
    Evicted/Run=238.94393920898438
    2010-07-27 22:46:40,854 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427,
    Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%,
    Evicted/Run=238.94393920898438
    2010-07-27 22:47:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427,
    Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%,
    Evicted/Run=238.94393920898438
    2010-07-27 22:48:40,853 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348427,
    Hit=1914, Miss=1346513, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14194316463544965%, Miss Ratio=99.85805749893188%,
    Evicted/Run=238.94393920898438
    2010-07-27 22:49:40,876 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1348659,
    Hit=1914, Miss=1346745, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14191875234246254%, Miss Ratio=99.8580813407898%,
    Evicted/Run=238.94393920898438
    2010-07-27 22:50:41,140 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349089,
    Hit=1914, Miss=1347175, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.1418735133484006%, Miss Ratio=99.85812902450562%,
    Evicted/Run=238.94393920898438
    Hmm... on the other hand, not much going on in the block cache -- not
    enough to worry about... numbers are staying pretty constant over
    time.

    4.   A minor number of region servers (8 out of 150) crash (more
    of a graceful shutdown) while performing the scan and I saw these
    messages just before the shutdown started:
    2010-07-27 22:52:57,448 WARN org.apache.zookeeper.ClientCnxn:
    Exception closing session 0x2a0190c221002c to
    sun.nio.ch.SelectionKeyImpl@286e4365
    java.io.IOException: TIMED OUT
    at
    org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
    2010-07-27 22:52:57,454 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349486,
    Hit=1914, Miss=1347572, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14183177845552564%, Miss Ratio=99.85817074775696%,
    Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,454 DEBUG
    org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
    Total=1950.0773MB (2044804168), Free=449.7541MB (471601336),
    Max=2399.8313MB (2516405504), Counts: Blocks=1914, Access=1349486,
    Hit=1914, Miss=1347572, Evictions=3300, Evicted=788515, Ratios: Hit
    Ratio=0.14183177845552564%, Miss Ratio=99.85817074775696%,
    Evicted/Run=238.94393920898438
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC
    Server Responder, call next(-30527588457879943, 30) from
    98.136.191.88:33914: output error
    2010-07-27 22:52:57,548 WARN org.apache.hadoop.ipc.HBaseServer: IPC
    Server Responder, call next(-7358804533352185527, 30) from
    98.136.191.88:33963: output error
    2010-07-27 22:52:57,602 WARN org.apache.hadoop.ipc.HBaseServer: IPC
    Server Responder, call next(-3402818986992405350, 30) from
    98.136.191.88:33927: output error
    2010-07-27 22:52:57,603 WARN org.apache.hadoop.ipc.HBaseServer: IPC
    Server Responder, call next(-6635348671997204137, 30) from
    98.136.191.88:33916: output error
    2010-07-27 22:52:57,603 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
    event, state: Disconnected, type: None, path: null
    2010-07-27 22:52:57,603 INFO org.apache.hadoop.ipc.HBaseServer: IPC
    Server handler 21 on 60020 caught:
    java.nio.channels.ClosedChannelException
    ....................
    .....................
    2010-07-27 22:52:58,059 WARN
    org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
    org.apache.hadoop.hbase.Leases$LeaseStillHeldException

    You should up your zk session timeout given the type of job you are
    running. Up the session timeout in hbase-site.xml. See:

    <property>
    <name>zookeeper.session.timeout</name>
    <value>60000</value>
    <description>ZooKeeper session timeout.
    HBase passes this to the zk quorum as suggested maximum time for
    a
    session. See
    http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.ht
    ml#ch_zkSessions
    "The client sends a requested timeout, the server responds with
    the
    timeout that it can give the client. "
    In milliseconds.
    </description>
    </property>

    Read that URL. Says "The current implementation requires that the
    timeout be a minimum of 2 times the tickTime (as set in the server
    configuration) and a maximum of 20 times the tickTime. " so you need
    to up hbase.zookeeper.property.tickTime too... from its current
    setting of '3'. Set it to 6 or 9 even.

    St.Ack

    Thanks in advance,
    Vidhya

    On 7/26/10 2:43 PM, "Vidhyashankar Venkataraman" <vidhyash@yahoo-
    inc.com> wrote:
    I am trying to assess the performance of Scans on a 100TB db on 180
    nodes running Hbase 0.20.5..
    I run a sharded scan (each Map task runs a scan on a specific range:
    speculative execution is turned false so that there is no duplication
    in tasks) on a fully compacted table...
    1 MB block size, Block cache enabled.. Max of 2 tasks per node..
    Each row is 30 KB in size: 1 big column family with just one field..
    Region lease timeout is set to an hour.. And I don't get any socket
    timeout exceptions so I have not reassigned the write socket timeout...
    I ran experiments on the following cases:

    1.  The client level cache is set to 1 (default: got he number using
    getCaching): The MR tasks take around 13 hours to finish in the
    average.. Which gives around 13.17 MBps per node. The worst case is 34
    hours (to finish the entire job)...
    2.  Client cache set to 20 rows: this is much worse than the
    previous case: we get around a super low 1MBps per node...
    Question: Should I set it to a value such that the block size
    is a multiple of the above said cache size? Or the cache size to a much
    lower value?
    I find that these numbers are much less than the ones I get when it's
    running with just a few nodes..
    Can you guys help me with this problem?

    Thank you
    Vidhya

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJul 26, '10 at 9:44p
activeJul 28, '10 at 6:51p
posts14
users6
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase