Grokbase Groups HBase user June 2011
FAQ

[HBase-user] feature request (count)

Jack Levin
Jun 3, 2011 at 10:40 pm
"Each HFile knows how many KV entries there are in it, but this does
not map in a general way to the
number of rows, or the number of rows with a specific column."

It would be nice to have an index like that; Would solve a lot of
issues for people migrating from mysql. I assume that without the
'count' feature, people are resorting to storing dataset elements in
other engines, which is not great, since you then end up to require a
non-hbase index to be consistent and authoritative for all of your
datasets that require counts.

-Jack

On Fri, Jun 3, 2011 at 3:24 PM, Ryan Rawson wrote:
This is a commonly requested feature, and it remains unimplemented
because it is actually quite hard.  Each HFile knows how many KV
entries there are in it, but this does not map in a general way to the
number of rows, or the number of rows with a specific column. Keeping
track of the row count as new rows are created is also not as easy as
it seems - this is because a Put does not know if a row already exists
or not.  Making it aware of that fact would require doing a get before
a put - not cheap.

-ryan
On Fri, Jun 3, 2011 at 3:20 PM, Jack Levin wrote:
I have a feature request:  There should be a native function called
'count', that produces count of rows based on specific family filter,
that is internal to HBASE and won't be required to read CELLs off the
disk/cache.  Just count up the rows in the most efficient way
possible.  I realize that family definitions are part of the cells, so
it would be nice to have an index that somehow can produce low IO/CPU
hit to hbase when doing a count (for example enabling an index like
that in table schema would be how you turn it on for a specific
family).

Best,

-Jack
reply

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions