I have a feature request: There should be a native function called
'count', that produces count of rows based on specific family filter,
that is internal to HBASE and won't be required to read CELLs off the
disk/cache. Just count up the rows in the most efficient way
possible. I realize that family definitions are part of the cells, so
it would be nice to have an index that somehow can produce low IO/CPU
hit to hbase when doing a count (for example enabling an index like
that in table schema would be how you turn it on for a specific
family).
Best,
-Jack
[HBase-user] feature request (count)
| Tweet |
|
Search Discussions
Discussion Posts
Follow ups
- Ryan Rawson: This is a commonly requested feature, and it remains unimplemented because it is actually quite hard. Each HFile knows how many KV entries there are in it, but this does not map in a general way to the number of rows, or the number of rows with a specific column. Keeping track of the row count as new rows are created is also not as easy as it seems - this is because a Put does not know if a row already exists or not. Making it aware of that fact would require doing a get before a put - not
- Jack Levin: "Each HFile knows how many KV entries there are in it, but this does not map in a general way to the number of rows, or the number of rows with a specific column." It would be nice to have an index like that; Would solve a lot of issues for people migrating from mysql. I assume that without the 'count' feature, people are resorting to storing dataset elements in other engines, which is not great, since you then end up to require a non-hbase index to be consistent and authoritative for all of
- Bill Graham: One alternative option is to calculate some stats during compactions and store that somewhere for retrieval. The metrics wouldn't be up to date of course, since they've be stats from the last compaction time. I think that would still be useful info to have, but it's different than what's being requested.
Related Discussions
Discussion Overview
| group | user
|
| categories | hbase, hadoop |
| posted | Jun 3, '11 at 10:20p |
| active | Jun 6, '11 at 8:15p |
| posts | 7 |
| users | 5 |
| website | hbase.apache.org |
