Grokbase Groups HBase dev June 2011
FAQ
I want to take a wh/hack at creating a pluggable block index, is there
an open issue for this? I looked and couldn't find one.

Search Discussions

  • Stack at Jun 4, 2011 at 10:17 pm
    I do not know of one. FYI hfile is pretty standalone regards tests etc. There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this? I looked and couldn't find one.
  • Ryan Rawson at Jun 4, 2011 at 10:28 pm
    What are the specs/goals of a pluggable block index? Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc. You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this?  I looked and couldn't find one.
  • Jason Rutherglen at Jun 4, 2011 at 10:31 pm
    You'd have to change how the Scanner code works, etc. You'll find out.
    Nice! Sounds fun.
    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
    What are the specs/goals of a pluggable block index?  Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc.  You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this?  I looked and couldn't find one.
  • Ryan Rawson at Jun 4, 2011 at 10:36 pm
    Also, dont break it :-)

    Part of the goal of HFile was to build something quick and reliable.
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt. Keeping it simple is one strategy.

    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues and callers would end up with copies, not views on,
    the original data. The JVM is fairly good about short lived objects
    (up to a certain allocation rate that is), and while the original goal
    was to reduce memory usage, it could make sense to take a higher short
    term allocation rate if the wins from prefix compression are there.

    Also note that in whole-system profiling, often repeated methods in
    KeyValue do pop up. The goal of KeyValue was to have a format that
    didnt require deserialization into larger data structures (hence the
    lack of vint), and would be simple and fast. Undoing that work should
    be accompanied with profiling evidence that new slowdowns were not
    introduced.

    -ryan

    On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
    wrote:
    You'd have to change how the Scanner code works, etc.  You'll find out.
    Nice!  Sounds fun.
    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
    What are the specs/goals of a pluggable block index?  Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc.  You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this?  I looked and couldn't find one.
  • Jason Rutherglen at Jun 4, 2011 at 10:41 pm

    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt. Keeping it simple is one strategy.
    Isn't the block index separate from the actual data? So corruption in
    that case is unlikely.
    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues
    I think we can try that later. I'm not sure one can make a hard and
    fast rule to always load the keys into RAM as an FST. The block index
    would seem to be fairly separate.
    On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson wrote:
    Also, dont break it :-)

    Part of the goal of HFile was to build something quick and reliable.
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt.  Keeping it simple is one strategy.

    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues and callers would end up with copies, not views on,
    the original data.  The JVM is fairly good about short lived objects
    (up to a certain allocation rate that is), and while the original goal
    was to reduce memory usage, it could make sense to take a higher short
    term allocation rate if the wins from prefix compression are there.

    Also note that in whole-system profiling, often repeated methods in
    KeyValue do pop up.  The goal of KeyValue was to have a format that
    didnt require deserialization into larger data structures (hence the
    lack of vint), and would be simple and fast.  Undoing that work should
    be accompanied with profiling evidence that new slowdowns were not
    introduced.

    -ryan

    On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
    wrote:
    You'd have to change how the Scanner code works, etc.  You'll find out.
    Nice!  Sounds fun.
    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
    What are the specs/goals of a pluggable block index?  Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc.  You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this?  I looked and couldn't find one.
  • Ryan Rawson at Jun 4, 2011 at 10:50 pm
    Oh BTW, you can't mmap anything in HBase unless you copy it to local
    disk first. HDFS => no mmap.

    just thought you'd like to know.

    On Sat, Jun 4, 2011 at 3:41 PM, Jason Rutherglen
    wrote:
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt.  Keeping it simple is one strategy.
    Isn't the block index separate from the actual data?  So corruption in
    that case is unlikely.
    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues
    I think we can try that later.  I'm not sure one can make a hard and
    fast rule to always load the keys into RAM as an FST.  The block index
    would seem to be fairly separate.
    On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson wrote:
    Also, dont break it :-)

    Part of the goal of HFile was to build something quick and reliable.
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt.  Keeping it simple is one strategy.

    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues and callers would end up with copies, not views on,
    the original data.  The JVM is fairly good about short lived objects
    (up to a certain allocation rate that is), and while the original goal
    was to reduce memory usage, it could make sense to take a higher short
    term allocation rate if the wins from prefix compression are there.

    Also note that in whole-system profiling, often repeated methods in
    KeyValue do pop up.  The goal of KeyValue was to have a format that
    didnt require deserialization into larger data structures (hence the
    lack of vint), and would be simple and fast.  Undoing that work should
    be accompanied with profiling evidence that new slowdowns were not
    introduced.

    -ryan

    On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
    wrote:
    You'd have to change how the Scanner code works, etc.  You'll find out.
    Nice!  Sounds fun.
    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
    What are the specs/goals of a pluggable block index?  Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc.  You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this?  I looked and couldn't find one.
  • Jason Rutherglen at Jun 4, 2011 at 10:57 pm

    Oh BTW, you can't mmap anything in HBase unless you copy it to local
    disk first. HDFS => no mmap.
    Right. I know that! Once the block index is pluggable, the FST would
    be an in heap byte[].
    On Sat, Jun 4, 2011 at 3:49 PM, Ryan Rawson wrote:
    Oh BTW, you can't mmap anything in HBase unless you copy it to local
    disk first.  HDFS => no mmap.

    just thought you'd like to know.

    On Sat, Jun 4, 2011 at 3:41 PM, Jason Rutherglen
    wrote:
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt.  Keeping it simple is one strategy.
    Isn't the block index separate from the actual data?  So corruption in
    that case is unlikely.
    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues
    I think we can try that later.  I'm not sure one can make a hard and
    fast rule to always load the keys into RAM as an FST.  The block index
    would seem to be fairly separate.
    On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson wrote:
    Also, dont break it :-)

    Part of the goal of HFile was to build something quick and reliable.
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt.  Keeping it simple is one strategy.

    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues and callers would end up with copies, not views on,
    the original data.  The JVM is fairly good about short lived objects
    (up to a certain allocation rate that is), and while the original goal
    was to reduce memory usage, it could make sense to take a higher short
    term allocation rate if the wins from prefix compression are there.

    Also note that in whole-system profiling, often repeated methods in
    KeyValue do pop up.  The goal of KeyValue was to have a format that
    didnt require deserialization into larger data structures (hence the
    lack of vint), and would be simple and fast.  Undoing that work should
    be accompanied with profiling evidence that new slowdowns were not
    introduced.

    -ryan

    On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
    wrote:
    You'd have to change how the Scanner code works, etc.  You'll find out.
    Nice!  Sounds fun.
    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
    What are the specs/goals of a pluggable block index?  Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc.  You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this?  I looked and couldn't find one.
  • Jason Rutherglen at Jun 6, 2011 at 6:34 am
    Ok, the block index is only storing the first key of each block?
    Hmm... I think we can store a pointer to an exact position in the
    block, or at least allow that (for the FST implementation).

    How efficient is the current seeking?
    I have previously thought about prefix compression, it seemed doable,
    It does look like prefix compression should be doable. Eg, we'd seek
    to a position based on the block index (from which we'd have the
    entire key). From the seek'd to position, we could scan and load up
    each subsequent prefix compressed key into a KeyValue, though right
    the KV wouldn't be 'pointing' back to the internals of the block, it'd
    be creating a whole new byte[] for each KV (which could have it's own
    garbage related ramifications).
    you'd need a compressing algorithm
    Lucene's terms dict is very simple. The next key has the position at
    which the previous key differs.
    On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson wrote:
    Also, dont break it :-)

    Part of the goal of HFile was to build something quick and reliable.
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt.  Keeping it simple is one strategy.

    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues and callers would end up with copies, not views on,
    the original data.  The JVM is fairly good about short lived objects
    (up to a certain allocation rate that is), and while the original goal
    was to reduce memory usage, it could make sense to take a higher short
    term allocation rate if the wins from prefix compression are there.

    Also note that in whole-system profiling, often repeated methods in
    KeyValue do pop up.  The goal of KeyValue was to have a format that
    didnt require deserialization into larger data structures (hence the
    lack of vint), and would be simple and fast.  Undoing that work should
    be accompanied with profiling evidence that new slowdowns were not
    introduced.

    -ryan

    On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
    wrote:
    You'd have to change how the Scanner code works, etc.  You'll find out.
    Nice!  Sounds fun.
    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
    What are the specs/goals of a pluggable block index?  Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc.  You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this?  I looked and couldn't find one.
  • Ryan Rawson at Jun 6, 2011 at 6:37 am

    On Sun, Jun 5, 2011 at 11:33 PM, Jason Rutherglen wrote:
    Ok, the block index is only storing the first key of each block?
    Hmm... I think we can store a pointer to an exact position in the
    block, or at least allow that (for the FST implementation).
    Are you sure that is a good idea? Surely the disk seeks would destroy
    you on index load?


    How efficient is the current seeking?
    I have previously thought about prefix compression, it seemed doable,
    It does look like prefix compression should be doable.  Eg, we'd seek
    to a position based on the block index (from which we'd have the
    entire key).  From the seek'd to position, we could scan and load up
    each subsequent prefix compressed key into a KeyValue, though right
    the KV wouldn't be 'pointing' back to the internals of the block, it'd
    be creating a whole new byte[] for each KV (which could have it's own
    garbage related ramifications).
    you'd need a compressing algorithm
    Lucene's terms dict is very simple.  The next key has the position at
    which the previous key differs.
    On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson wrote:
    Also, dont break it :-)

    Part of the goal of HFile was to build something quick and reliable.
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt.  Keeping it simple is one strategy.

    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues and callers would end up with copies, not views on,
    the original data.  The JVM is fairly good about short lived objects
    (up to a certain allocation rate that is), and while the original goal
    was to reduce memory usage, it could make sense to take a higher short
    term allocation rate if the wins from prefix compression are there.

    Also note that in whole-system profiling, often repeated methods in
    KeyValue do pop up.  The goal of KeyValue was to have a format that
    didnt require deserialization into larger data structures (hence the
    lack of vint), and would be simple and fast.  Undoing that work should
    be accompanied with profiling evidence that new slowdowns were not
    introduced.

    -ryan

    On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
    wrote:
    You'd have to change how the Scanner code works, etc.  You'll find out.
    Nice!  Sounds fun.
    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
    What are the specs/goals of a pluggable block index?  Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc.  You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:

    I want to take a wh/hack at creating a pluggable block index, is there
    an open issue for this?  I looked and couldn't find one.
  • M. C. Srivas at Jun 6, 2011 at 7:06 am

    On Sun, Jun 5, 2011 at 11:37 PM, Ryan Rawson wrote:

    On Sun, Jun 5, 2011 at 11:33 PM, Jason Rutherglen
    wrote:
    Ok, the block index is only storing the first key of each block?
    Hmm... I think we can store a pointer to an exact position in the
    block, or at least allow that (for the FST implementation).
    Are you sure that is a good idea? Surely the disk seeks would destroy
    you on index load?
    I agree, it would be pretty bad.

    But, assuming that the block size is set appropriately, copying one key per
    100 or so values into the block index does not really bloat the hfile and is
    good trade-off to avoid the seeking. Plus, it does not prevent
    prefix-compression inside the block itself. Are we considering
    prefix-compression of keys across blocks?


    How efficient is the current seeking?
    I have previously thought about prefix compression, it seemed doable,
    It does look like prefix compression should be doable. Eg, we'd seek
    to a position based on the block index (from which we'd have the
    entire key). From the seek'd to position, we could scan and load up
    each subsequent prefix compressed key into a KeyValue, though right
    the KV wouldn't be 'pointing' back to the internals of the block, it'd
    be creating a whole new byte[] for each KV (which could have it's own
    garbage related ramifications).
    you'd need a compressing algorithm
    Lucene's terms dict is very simple. The next key has the position at
    which the previous key differs.
    On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson wrote:
    Also, dont break it :-)

    Part of the goal of HFile was to build something quick and reliable.
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt. Keeping it simple is one strategy.

    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues and callers would end up with copies, not views on,
    the original data. The JVM is fairly good about short lived objects
    (up to a certain allocation rate that is), and while the original goal
    was to reduce memory usage, it could make sense to take a higher short
    term allocation rate if the wins from prefix compression are there.

    Also note that in whole-system profiling, often repeated methods in
    KeyValue do pop up. The goal of KeyValue was to have a format that
    didnt require deserialization into larger data structures (hence the
    lack of vint), and would be simple and fast. Undoing that work should
    be accompanied with profiling evidence that new slowdowns were not
    introduced.

    -ryan

    On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
    wrote:
    You'd have to change how the Scanner code works, etc. You'll find
    out.
    Nice! Sounds fun.
    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
    What are the specs/goals of a pluggable block index? Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc. You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one. FYI hfile is pretty standalone regards tests
    etc. There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen <
    jason.rutherglen@gmail.com> wrote:
    I want to take a wh/hack at creating a pluggable block index, is
    there
    an open issue for this? I looked and couldn't find one.
  • Ryan Rawson at Jun 6, 2011 at 7:12 am
    When I thought about it, I didn't think cross-block compression would
    be a good idea - this is because you want to be able to decompress
    each block independently of each other. Perhaps a master HFile
    dictionary or something.

    -ryan
    On Mon, Jun 6, 2011 at 12:06 AM, M. C. Srivas wrote:
    On Sun, Jun 5, 2011 at 11:37 PM, Ryan Rawson wrote:

    On Sun, Jun 5, 2011 at 11:33 PM, Jason Rutherglen
    wrote:
    Ok, the block index is only storing the first key of each block?
    Hmm... I think we can store a pointer to an exact position in the
    block, or at least allow that (for the FST implementation).
    Are you sure that is a good idea?  Surely the disk seeks would destroy
    you on index load?
    I agree, it would be pretty bad.

    But, assuming that the block size is set appropriately, copying one key per
    100 or so values into the block index does not really bloat the hfile and is
    good trade-off to avoid the seeking. Plus, it does not prevent
    prefix-compression inside the block itself. Are we considering
    prefix-compression of keys across blocks?


    How efficient is the current seeking?
    I have previously thought about prefix compression, it seemed doable,
    It does look like prefix compression should be doable.  Eg, we'd seek
    to a position based on the block index (from which we'd have the
    entire key).  From the seek'd to position, we could scan and load up
    each subsequent prefix compressed key into a KeyValue, though right
    the KV wouldn't be 'pointing' back to the internals of the block, it'd
    be creating a whole new byte[] for each KV (which could have it's own
    garbage related ramifications).
    you'd need a compressing algorithm
    Lucene's terms dict is very simple.  The next key has the position at
    which the previous key differs.
    On Sat, Jun 4, 2011 at 3:35 PM, Ryan Rawson wrote:
    Also, dont break it :-)

    Part of the goal of HFile was to build something quick and reliable.
    It can be hard to know you have all the corner cases down and you
    won't find out in 6 months that every single piece of data you have
    put in HBase is corrupt.  Keeping it simple is one strategy.

    I have previously thought about prefix compression, it seemed doable,
    you'd need a compressing algorithm, then in the Scanner you would
    expand KeyValues and callers would end up with copies, not views on,
    the original data.  The JVM is fairly good about short lived objects
    (up to a certain allocation rate that is), and while the original goal
    was to reduce memory usage, it could make sense to take a higher short
    term allocation rate if the wins from prefix compression are there.

    Also note that in whole-system profiling, often repeated methods in
    KeyValue do pop up.  The goal of KeyValue was to have a format that
    didnt require deserialization into larger data structures (hence the
    lack of vint), and would be simple and fast.  Undoing that work should
    be accompanied with profiling evidence that new slowdowns were not
    introduced.

    -ryan

    On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
    wrote:
    You'd have to change how the Scanner code works, etc.  You'll find
    out.
    Nice!  Sounds fun.

    On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson <ryanobjc@gmail.com>
    wrote:
    What are the specs/goals of a pluggable block index?  Right now the
    block index is fairly tied deep in how HFile works. You'd have to
    change how the Scanner code works, etc.  You'll find out.


    On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
    I do not know of one.  FYI hfile is pretty standalone regards tests
    etc.  There is even a perf testing class for hfile


    On Jun 4, 2011, at 14:44, Jason Rutherglen <
    jason.rutherglen@gmail.com> wrote:
    I want to take a wh/hack at creating a pluggable block index, is
    there
    an open issue for this?  I looked and couldn't find one.
  • Jason Rutherglen at Jun 6, 2011 at 4:10 pm

    But, assuming that the block size is set appropriately, copying one key per
    100 or so values into the block index does not really bloat the hfile
    Right, this is what Lucene does and should work fine for HBase. Eg,
    the FST will enable a denser index of keys at the same storage costs.
  • Jason Rutherglen at Jun 6, 2011 at 4:08 pm

    Are you sure that is a good idea? Surely the disk seeks would destroy
    you on index load?
    I'm not sure what you mean there. We can create a total HFile block
    key index that has pointers to within blocks, eg, they'll store a
    block index int, and a position (within the block) int. Where would
    the index load occur as this method should allow faster key lookup?
    On Sun, Jun 5, 2011 at 11:37 PM, Ryan Rawson wrote:
    On Sun, Jun 5, 2011 at 11:33 PM, Jason Rutherglen
    wrote:
    Ok, the block index is only storing the first key of each block?
    Hmm... I think we can store a pointer to an exact position in the
    block, or at least allow that (for the FST implementation).
    Are you sure that is a good idea?  Surely the disk seeks would destroy
    you on index load?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJun 4, '11 at 9:44p
activeJun 6, '11 at 4:10p
posts14
users4
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase