Also, dont break it :-)
Part of the goal of HFile was to build something quick and reliable.
It can be hard to know you have all the corner cases down and you
won't find out in 6 months that every single piece of data you have
put in HBase is corrupt. Keeping it simple is one strategy.
I have previously thought about prefix compression, it seemed doable,
you'd need a compressing algorithm, then in the Scanner you would
expand KeyValues and callers would end up with copies, not views on,
the original data. The JVM is fairly good about short lived objects
(up to a certain allocation rate that is), and while the original goal
was to reduce memory usage, it could make sense to take a higher short
term allocation rate if the wins from prefix compression are there.
Also note that in whole-system profiling, often repeated methods in
KeyValue do pop up. The goal of KeyValue was to have a format that
didnt require deserialization into larger data structures (hence the
lack of vint), and would be simple and fast. Undoing that work should
be accompanied with profiling evidence that new slowdowns were not
On Sat, Jun 4, 2011 at 3:30 PM, Jason Rutherglen
You'd have to change how the Scanner code works, etc. You'll find out.
Nice! Sounds fun.On Sat, Jun 4, 2011 at 3:27 PM, Ryan Rawson wrote:
What are the specs/goals of a pluggable block index? Right now the
block index is fairly tied deep in how HFile works. You'd have to
change how the Scanner code works, etc. You'll find out.
On Sat, Jun 4, 2011 at 3:17 PM, Stack wrote:
I do not know of one. FYI hfile is pretty standalone regards tests etc. There is even a perf testing class for hfile
On Jun 4, 2011, at 14:44, Jason Rutherglen wrote:
I want to take a wh/hack at creating a pluggable block index, is there
an open issue for this? I looked and couldn't find one.