Grokbase Groups HBase user July 2011
FAQ
Hi:
we found a strange problem in our read test.
It is a 5 nodes cluster.Four of our 5 regionservers
set hfile.block.cache.size=0.4, one of them is 0.1(node A). When we random
read from a 2TB data table we found node A's network reached 100MB, and
others are less than 10MB. We kown node A need to read data from disks and
put them in blockcache. In the follow codes in LruBlockCache:
--------------------------------------------------------------------------------------------------------------------------
public void cacheBlock(String blockName, ByteBuffer buf, boolean inMemory)
{
CachedBlock cb = map.get(blockName);
if(cb != null) {
throw new RuntimeException("Cached an already cached block");
}
cb = new CachedBlock(blockName, buf, count.incrementAndGet(), inMemory);
long newSize = size.addAndGet(cb.heapSize());
map.put(blockName, cb);
elements.incrementAndGet();
if(newSize > acceptableSize() && !evictionInProgress) {
runEviction();
}
}
--------------------------------------------------------------------------------------------------------------------------




We debugged this code with btrace like follow code:
--------------------------------------------------------------------------------------------------------------------------
import static com.sun.btrace.BTraceUtils.*;
import com.sun.btrace.annotations.*;

import java.nio.ByteBuffer;
import org.apache.hadoop.hbase.io.hfile.*;

@BTrace public class TestRegion{
@OnMethod(
clazz="org.apache.hadoop.hbase.io.hfile.LruBlockCache",
method="cacheBlock"
)
public static void traceCacheBlock(@Self LruBlockCache instance,String
blockName, ByteBuffer buf, boolean inMemory){
println(strcat("size:
",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","size"),instance))));
println(strcat("elements:
",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","elements"),instance))));
}
}
--------------------------------------------------------------------------------------------------------------------------



We found that the "size" increace 5 MB each time in node A! Why not 64 KB
each time?? But the "size" increace 64 KB when we run this btrace code in
other nodes at the same time.

The follow codes also confirm the problem because the "decompressedSize"
is 5 MB each time in node A!
-------------------------------------------------------------------------------------------------------------------------
import static com.sun.btrace.BTraceUtils.*;
import com.sun.btrace.annotations.*;

import java.nio.ByteBuffer;
import org.apache.hadoop.hbase.io.hfile.*;

@BTrace public class TestRegion1{
@OnMethod(
clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
method="decompress"
)
public static void traceCacheBlock(final long offset, final int
compressedSize,
final int decompressedSize, final boolean pread){
println(strcat("decompressedSize: ",str(decompressedSize)));
}
}
-------------------------------------------------------------------------------------------------------------------------



Why not 64 KB?

BTW: When we set hfile.block.cache.size=0.4 in node A, the
"decompressedSize" down to 64 KB, and the tps is up to high level.

Search Discussions

  • Stack at Jul 14, 2011 at 8:12 pm
    This is interesting. Any chance that the cells on the regions hosted
    on server A are 5M in size?

    The hfile block sizes are by default configured to be 64k but rare
    would an hfile block ever be exactly 64k. We do not cut the hfile
    block content at 64k exactly. The hfile block boundary will be at a
    keyvalue boundary.

    If a cell were 5MB, it does not get split across multiple hfile
    blocks. It will occupy one hfile block.

    Could it be that the region hosted on A is not like the others and it
    has lots of these 5MB sizes?

    Let us know. If above is not the case, then you have an interesting
    phenomenon going on and we need to dig in more.

    St.Ack

    On Thu, Jul 14, 2011 at 5:27 AM, Mingjian Deng wrote:
    Hi:
    we found a strange problem in our read test.
    It is a 5 nodes cluster.Four of our 5 regionservers
    set hfile.block.cache.size=0.4, one of them is 0.1(node A). When we random
    read from a 2TB data table we found node A's network reached 100MB, and
    others are less than 10MB. We kown node A need to read data from disks and
    put them in blockcache. In the follow codes in LruBlockCache:
    --------------------------------------------------------------------------------------------------------------------------
    public void cacheBlock(String blockName, ByteBuffer buf, boolean inMemory)
    {
    CachedBlock cb = map.get(blockName);
    if(cb != null) {
    throw new RuntimeException("Cached an already cached block");
    }
    cb = new CachedBlock(blockName, buf, count.incrementAndGet(), inMemory);
    long newSize = size.addAndGet(cb.heapSize());
    map.put(blockName, cb);
    elements.incrementAndGet();
    if(newSize > acceptableSize() && !evictionInProgress) {
    runEviction();
    }
    }
    --------------------------------------------------------------------------------------------------------------------------




    We debugged this code with btrace like follow code:
    --------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.LruBlockCache",
    method="cacheBlock"
    )
    public static void traceCacheBlock(@Self LruBlockCache instance,String
    blockName, ByteBuffer buf, boolean inMemory){
    println(strcat("size:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","size"),instance))));
    println(strcat("elements:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","elements"),instance))));
    }
    }
    --------------------------------------------------------------------------------------------------------------------------



    We found that the "size" increace 5 MB each time in node A! Why not 64 KB
    each time?? But the "size" increace 64 KB when we run this btrace code in
    other nodes at the same time.

    The follow codes also confirm the problem because the "decompressedSize"
    is 5 MB each time in node A!
    -------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompressedSize: ",str(decompressedSize)));
    }
    }
    -------------------------------------------------------------------------------------------------------------------------



    Why not 64 KB?

    BTW: When we set hfile.block.cache.size=0.4 in node A, the
    "decompressedSize" down to 64 KB, and the tps is up to high level.
  • Mingjian Deng at Jul 15, 2011 at 3:41 am
    Hi stack:
    Server A or B is the same in the cluster. If I set
    hfile.block.cache.size=0.1
    on other server, the problem will reappear.But When I set
    hfile.block.cache.size = 0.15 or more, it won't reappear. So I think you can
    test on your own cluster.
    With the follow btrace codes:
    --------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompress: ",str(decompressedSize)));
    }
    }
    --------------------------------------------------------------

    If I set hfile.block.cache.size=0.1, the result is:
    -----------
    .......
    decompress: 6020488
    decompress: 6022536
    decompress: 5991304
    decompress: 6283272
    decompress: 5957896
    decompress: 6246280
    decompress: 6041096
    decompress: 6541448
    decompress: 6039560
    .......
    -----------
    If I set hfile.block.cache.size=0.12, the result is:
    -----------
    ......
    decompress: 65775
    decompress: 65556
    decompress: 65552
    decompress: 9914120
    decompress: 6026888
    decompress: 65615
    decompress: 65627
    decompress: 6247944
    decompress: 5880840
    decompress: 65646
    ......
    -----------
    If I set hfile.block.cache.size=0.15 or more, the result is:
    -----------
    ......
    decompress: 65646
    decompress: 65615
    decompress: 65627
    decompress: 65775
    decompress: 65556
    decompress: 65552
    decompress: 65646
    decompress: 65615
    decompress: 65627
    decompress: 65775
    decompress: 65556
    decompress: 65552
    ......
    -----------

    All of above tests run more than 10 minutes in high level read speed. So
    it is very strange phenomenon.

    2011/7/15 Stack <stack@duboce.net>
    This is interesting. Any chance that the cells on the regions hosted
    on server A are 5M in size?

    The hfile block sizes are by default configured to be 64k but rare
    would an hfile block ever be exactly 64k. We do not cut the hfile
    block content at 64k exactly. The hfile block boundary will be at a
    keyvalue boundary.

    If a cell were 5MB, it does not get split across multiple hfile
    blocks. It will occupy one hfile block.

    Could it be that the region hosted on A is not like the others and it
    has lots of these 5MB sizes?

    Let us know. If above is not the case, then you have an interesting
    phenomenon going on and we need to dig in more.

    St.Ack

    On Thu, Jul 14, 2011 at 5:27 AM, Mingjian Deng wrote:
    Hi:
    we found a strange problem in our read test.
    It is a 5 nodes cluster.Four of our 5 regionservers
    set hfile.block.cache.size=0.4, one of them is 0.1(node A). When we random
    read from a 2TB data table we found node A's network reached 100MB, and
    others are less than 10MB. We kown node A need to read data from disks and
    put them in blockcache. In the follow codes in LruBlockCache:
    --------------------------------------------------------------------------------------------------------------------------
    public void cacheBlock(String blockName, ByteBuffer buf, boolean inMemory)
    {
    CachedBlock cb = map.get(blockName);
    if(cb != null) {
    throw new RuntimeException("Cached an already cached block");
    }
    cb = new CachedBlock(blockName, buf, count.incrementAndGet(),
    inMemory);
    long newSize = size.addAndGet(cb.heapSize());
    map.put(blockName, cb);
    elements.incrementAndGet();
    if(newSize > acceptableSize() && !evictionInProgress) {
    runEviction();
    }
    }
    --------------------------------------------------------------------------------------------------------------------------



    We debugged this code with btrace like follow code:
    --------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.LruBlockCache",
    method="cacheBlock"
    )
    public static void traceCacheBlock(@Self LruBlockCache instance,String
    blockName, ByteBuffer buf, boolean inMemory){
    println(strcat("size:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","size"),instance))));
    println(strcat("elements:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","elements"),instance))));
    }
    }
    --------------------------------------------------------------------------------------------------------------------------


    We found that the "size" increace 5 MB each time in node A! Why not 64 KB
    each time?? But the "size" increace 64 KB when we run this btrace code in
    other nodes at the same time.

    The follow codes also confirm the problem because the "decompressedSize"
    is 5 MB each time in node A!
    -------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompressedSize: ",str(decompressedSize)));
    }
    }
    -------------------------------------------------------------------------------------------------------------------------


    Why not 64 KB?

    BTW: When we set hfile.block.cache.size=0.4 in node A, the
    "decompressedSize" down to 64 KB, and the tps is up to high level.
  • Stack at Jul 15, 2011 at 5:28 pm
    Yes. Please file an issue. A few fellas are messing with block cache
    at the moment so they might be up for taking a detour to figure the
    why on your interesting observation.

    Thanks,
    St.Ack
    On Thu, Jul 14, 2011 at 8:41 PM, Mingjian Deng wrote:
    Hi stack:
    Server A or B is the same in the cluster. If I set
    hfile.block.cache.size=0.1
    on other server, the problem will reappear.But When I set
    hfile.block.cache.size = 0.15 or more, it won't reappear. So I think you can
    test on your own cluster.
    With the follow btrace codes:
    --------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompress: ",str(decompressedSize)));
    }
    }
    --------------------------------------------------------------

    If I set hfile.block.cache.size=0.1, the result is:
    -----------
    .......
    decompress: 6020488
    decompress: 6022536
    decompress: 5991304
    decompress: 6283272
    decompress: 5957896
    decompress: 6246280
    decompress: 6041096
    decompress: 6541448
    decompress: 6039560
    .......
    -----------
    If I set hfile.block.cache.size=0.12, the result is:
    -----------
    ......
    decompress: 65775
    decompress: 65556
    decompress: 65552
    decompress: 9914120
    decompress: 6026888
    decompress: 65615
    decompress: 65627
    decompress: 6247944
    decompress: 5880840
    decompress: 65646
    ......
    -----------
    If I set hfile.block.cache.size=0.15 or more, the result is:
    -----------
    ......
    decompress: 65646
    decompress: 65615
    decompress: 65627
    decompress: 65775
    decompress: 65556
    decompress: 65552
    decompress: 65646
    decompress: 65615
    decompress: 65627
    decompress: 65775
    decompress: 65556
    decompress: 65552
    ......
    -----------

    All of above tests run more than 10 minutes in high level read speed. So
    it is very strange phenomenon.

    2011/7/15 Stack <stack@duboce.net>
    This is interesting.  Any chance that the cells on the regions hosted
    on server A are 5M in size?

    The hfile block sizes are by default configured to be 64k but rare
    would an hfile block ever be exactly 64k.  We do not cut the hfile
    block content at 64k exactly.  The hfile block boundary will be at a
    keyvalue boundary.

    If a cell were 5MB, it does not get split across multiple hfile
    blocks.  It will occupy one hfile block.

    Could it be that the region hosted on A is not like the others and it
    has lots of these 5MB sizes?

    Let us know.  If above is not the case, then you have an interesting
    phenomenon going on and we need to dig in more.

    St.Ack


    On Thu, Jul 14, 2011 at 5:27 AM, Mingjian Deng <koven2049@gmail.com>
    wrote:
    Hi:
    we found a strange problem in our read test.
    It is a 5 nodes cluster.Four of our 5 regionservers
    set hfile.block.cache.size=0.4, one of them is 0.1(node A). When we random
    read from a 2TB data table we found node A's network reached 100MB, and
    others are less than 10MB. We kown node A need to read data from disks and
    put them in blockcache. In the follow codes in LruBlockCache:
    --------------------------------------------------------------------------------------------------------------------------
    public void cacheBlock(String blockName, ByteBuffer buf, boolean inMemory)
    {
    CachedBlock cb = map.get(blockName);
    if(cb != null) {
    throw new RuntimeException("Cached an already cached block");
    }
    cb = new CachedBlock(blockName, buf, count.incrementAndGet(),
    inMemory);
    long newSize = size.addAndGet(cb.heapSize());
    map.put(blockName, cb);
    elements.incrementAndGet();
    if(newSize > acceptableSize() && !evictionInProgress) {
    runEviction();
    }
    }
    --------------------------------------------------------------------------------------------------------------------------



    We debugged this code with btrace like follow code:
    --------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.LruBlockCache",
    method="cacheBlock"
    )
    public static void traceCacheBlock(@Self LruBlockCache instance,String
    blockName, ByteBuffer buf, boolean inMemory){
    println(strcat("size:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","size"),instance))));
    println(strcat("elements:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","elements"),instance))));
    }
    }
    --------------------------------------------------------------------------------------------------------------------------


    We found that the "size" increace 5 MB each time in node A! Why not 64 KB
    each time?? But the "size" increace 64 KB when we run this btrace code in
    other nodes at the same time.

    The follow codes also confirm the problem because the "decompressedSize"
    is 5 MB each time in node A!
    -------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompressedSize: ",str(decompressedSize)));
    }
    }
    -------------------------------------------------------------------------------------------------------------------------


    Why not 64 KB?

    BTW: When we set hfile.block.cache.size=0.4 in node A, the
    "decompressedSize" down to 64 KB, and the tps is up to high level.
  • Mingjian Deng at Jul 17, 2011 at 4:41 am
    Do you mean I need open a new issue?

    2011/7/16 Stack <stack@duboce.net>
    Yes. Please file an issue. A few fellas are messing with block cache
    at the moment so they might be up for taking a detour to figure the
    why on your interesting observation.

    Thanks,
    St.Ack
    On Thu, Jul 14, 2011 at 8:41 PM, Mingjian Deng wrote:
    Hi stack:
    Server A or B is the same in the cluster. If I set
    hfile.block.cache.size=0.1
    on other server, the problem will reappear.But When I set
    hfile.block.cache.size = 0.15 or more, it won't reappear. So I think you can
    test on your own cluster.
    With the follow btrace codes:
    --------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompress: ",str(decompressedSize)));
    }
    }
    --------------------------------------------------------------

    If I set hfile.block.cache.size=0.1, the result is:
    -----------
    .......
    decompress: 6020488
    decompress: 6022536
    decompress: 5991304
    decompress: 6283272
    decompress: 5957896
    decompress: 6246280
    decompress: 6041096
    decompress: 6541448
    decompress: 6039560
    .......
    -----------
    If I set hfile.block.cache.size=0.12, the result is:
    -----------
    ......
    decompress: 65775
    decompress: 65556
    decompress: 65552
    decompress: 9914120
    decompress: 6026888
    decompress: 65615
    decompress: 65627
    decompress: 6247944
    decompress: 5880840
    decompress: 65646
    ......
    -----------
    If I set hfile.block.cache.size=0.15 or more, the result is:
    -----------
    ......
    decompress: 65646
    decompress: 65615
    decompress: 65627
    decompress: 65775
    decompress: 65556
    decompress: 65552
    decompress: 65646
    decompress: 65615
    decompress: 65627
    decompress: 65775
    decompress: 65556
    decompress: 65552
    ......
    -----------

    All of above tests run more than 10 minutes in high level read speed. So
    it is very strange phenomenon.

    2011/7/15 Stack <stack@duboce.net>
    This is interesting. Any chance that the cells on the regions hosted
    on server A are 5M in size?

    The hfile block sizes are by default configured to be 64k but rare
    would an hfile block ever be exactly 64k. We do not cut the hfile
    block content at 64k exactly. The hfile block boundary will be at a
    keyvalue boundary.

    If a cell were 5MB, it does not get split across multiple hfile
    blocks. It will occupy one hfile block.

    Could it be that the region hosted on A is not like the others and it
    has lots of these 5MB sizes?

    Let us know. If above is not the case, then you have an interesting
    phenomenon going on and we need to dig in more.

    St.Ack


    On Thu, Jul 14, 2011 at 5:27 AM, Mingjian Deng <koven2049@gmail.com>
    wrote:
    Hi:
    we found a strange problem in our read test.
    It is a 5 nodes cluster.Four of our 5 regionservers
    set hfile.block.cache.size=0.4, one of them is 0.1(node A). When we random
    read from a 2TB data table we found node A's network reached 100MB,
    and
    others are less than 10MB. We kown node A need to read data from disks and
    put them in blockcache. In the follow codes in LruBlockCache:
    --------------------------------------------------------------------------------------------------------------------------
    public void cacheBlock(String blockName, ByteBuffer buf, boolean inMemory)
    {
    CachedBlock cb = map.get(blockName);
    if(cb != null) {
    throw new RuntimeException("Cached an already cached block");
    }
    cb = new CachedBlock(blockName, buf, count.incrementAndGet(),
    inMemory);
    long newSize = size.addAndGet(cb.heapSize());
    map.put(blockName, cb);
    elements.incrementAndGet();
    if(newSize > acceptableSize() && !evictionInProgress) {
    runEviction();
    }
    }
    --------------------------------------------------------------------------------------------------------------------------



    We debugged this code with btrace like follow code:
    --------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.LruBlockCache",
    method="cacheBlock"
    )
    public static void traceCacheBlock(@Self LruBlockCache
    instance,String
    blockName, ByteBuffer buf, boolean inMemory){
    println(strcat("size:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","size"),instance))));
    println(strcat("elements:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","elements"),instance))));
    }
    }
    --------------------------------------------------------------------------------------------------------------------------


    We found that the "size" increace 5 MB each time in node A! Why not
    64
    KB
    each time?? But the "size" increace 64 KB when we run this btrace code
    in
    other nodes at the same time.

    The follow codes also confirm the problem because the
    "decompressedSize"
    is 5 MB each time in node A!
    -------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompressedSize: ",str(decompressedSize)));
    }
    }
    -------------------------------------------------------------------------------------------------------------------------


    Why not 64 KB?

    BTW: When we set hfile.block.cache.size=0.4 in node A, the
    "decompressedSize" down to 64 KB, and the tps is up to high level.
  • Ted Yu at Jul 17, 2011 at 4:51 am
    Yes.
    On Saturday, July 16, 2011, Mingjian Deng wrote:
    Do you mean I need open a new issue?

    2011/7/16 Stack <stack@duboce.net>
    Yes.  Please file an issue.  A few fellas are messing with block cache
    at the moment so they might be up for taking a detour to figure the
    why on your interesting observation.

    Thanks,
    St.Ack

    On Thu, Jul 14, 2011 at 8:41 PM, Mingjian Deng <koven2049@gmail.com>
    wrote:
    Hi stack:
    Server A or B is the same in the cluster. If I set
    hfile.block.cache.size=0.1
    on other server, the problem will reappear.But When I set
    hfile.block.cache.size = 0.15 or more, it won't reappear. So I think you can
    test on your own cluster.
    With the follow btrace codes:
    --------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompress: ",str(decompressedSize)));
    }
    }
    --------------------------------------------------------------

    If I set hfile.block.cache.size=0.1, the result is:
    -----------
    .......
    decompress: 6020488
    decompress: 6022536
    decompress: 5991304
    decompress: 6283272
    decompress: 5957896
    decompress: 6246280
    decompress: 6041096
    decompress: 6541448
    decompress: 6039560
    .......
    -----------
    If I set hfile.block.cache.size=0.12, the result is:
    -----------
    ......
    decompress: 65775
    decompress: 65556
    decompress: 65552
    decompress: 9914120
    decompress: 6026888
    decompress: 65615
    decompress: 65627
    decompress: 6247944
    decompress: 5880840
    decompress: 65646
    ......
    -----------
    If I set hfile.block.cache.size=0.15 or more, the result is:
    -----------
    ......
    decompress: 65646
    decompress: 65615
    decompress: 65627
    decompress: 65775
    decompress: 65556
    decompress: 65552
    decompress: 65646
    decompress: 65615
    decompress: 65627
    decompress: 65775
    decompress: 65556
    decompress: 65552
    ......
    -----------

    All of above tests run more than 10 minutes in high level read speed. So
    it is very strange phenomenon.

    2011/7/15 Stack <stack@duboce.net>
    This is interesting.  Any chance that the cells on the regions hosted
    on server A are 5M in size?

    The hfile block sizes are by default configured to be 64k but rare
    would an hfile block ever be exactly 64k.  We do not cut the hfile
    block content at 64k exactly.  The hfile block boundary will be at a
    keyvalue boundary.

    If a cell were 5MB, it does not get split across multiple hfile
    blocks.  It will occupy one hfile block.

    Could it be that the region hosted on A is not like the others and it
    has lots of these 5MB sizes?

    Let us know.  If above is not the case, then you have an interesting
    phenomenon going on and we need to dig in more.

    St.Ack
  • Ted Yu at Jul 18, 2011 at 2:50 am

    Mingjian:
    final int decompressedSize, final boolean pread){
    println(strcat("decompressedSize: ",str(decompressedSize)));
    If you can include compressed size in btrace log, that would be more
    helpful.
    On Thu, Jul 14, 2011 at 1:12 PM, Stack wrote:

    This is interesting. Any chance that the cells on the regions hosted
    on server A are 5M in size?

    The hfile block sizes are by default configured to be 64k but rare
    would an hfile block ever be exactly 64k. We do not cut the hfile
    block content at 64k exactly. The hfile block boundary will be at a
    keyvalue boundary.

    If a cell were 5MB, it does not get split across multiple hfile
    blocks. It will occupy one hfile block.

    Could it be that the region hosted on A is not like the others and it
    has lots of these 5MB sizes?

    Let us know. If above is not the case, then you have an interesting
    phenomenon going on and we need to dig in more.

    St.Ack

    On Thu, Jul 14, 2011 at 5:27 AM, Mingjian Deng wrote:
    Hi:
    we found a strange problem in our read test.
    It is a 5 nodes cluster.Four of our 5 regionservers
    set hfile.block.cache.size=0.4, one of them is 0.1(node A). When we random
    read from a 2TB data table we found node A's network reached 100MB, and
    others are less than 10MB. We kown node A need to read data from disks and
    put them in blockcache. In the follow codes in LruBlockCache:
    --------------------------------------------------------------------------------------------------------------------------
    public void cacheBlock(String blockName, ByteBuffer buf, boolean inMemory)
    {
    CachedBlock cb = map.get(blockName);
    if(cb != null) {
    throw new RuntimeException("Cached an already cached block");
    }
    cb = new CachedBlock(blockName, buf, count.incrementAndGet(),
    inMemory);
    long newSize = size.addAndGet(cb.heapSize());
    map.put(blockName, cb);
    elements.incrementAndGet();
    if(newSize > acceptableSize() && !evictionInProgress) {
    runEviction();
    }
    }
    --------------------------------------------------------------------------------------------------------------------------



    We debugged this code with btrace like follow code:
    --------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.LruBlockCache",
    method="cacheBlock"
    )
    public static void traceCacheBlock(@Self LruBlockCache instance,String
    blockName, ByteBuffer buf, boolean inMemory){
    println(strcat("size:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","size"),instance))));
    println(strcat("elements:
    ",str(get(field("org.apache.hadoop.hbase.io.hfile.LruBlockCache","elements"),instance))));
    }
    }
    --------------------------------------------------------------------------------------------------------------------------


    We found that the "size" increace 5 MB each time in node A! Why not 64 KB
    each time?? But the "size" increace 64 KB when we run this btrace code in
    other nodes at the same time.

    The follow codes also confirm the problem because the "decompressedSize"
    is 5 MB each time in node A!
    -------------------------------------------------------------------------------------------------------------------------
    import static com.sun.btrace.BTraceUtils.*;
    import com.sun.btrace.annotations.*;

    import java.nio.ByteBuffer;
    import org.apache.hadoop.hbase.io.hfile.*;

    @BTrace public class TestRegion1{
    @OnMethod(
    clazz="org.apache.hadoop.hbase.io.hfile.HFile$Reader",
    method="decompress"
    )
    public static void traceCacheBlock(final long offset, final int
    compressedSize,
    final int decompressedSize, final boolean pread){
    println(strcat("decompressedSize: ",str(decompressedSize)));
    }
    }
    -------------------------------------------------------------------------------------------------------------------------


    Why not 64 KB?

    BTW: When we set hfile.block.cache.size=0.4 in node A, the
    "decompressedSize" down to 64 KB, and the tps is up to high level.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedJul 14, '11 at 12:28p
activeJul 18, '11 at 2:50a
posts7
users3
websitehbase.apache.org

3 users in discussion

Mingjian Deng: 3 posts Ted Yu: 2 posts Stack: 2 posts

People

Translate

site design / logo © 2019 Grokbase