We're currently testing Cassandra with a large number of row keys per
node - nodetool cfstats approximated number of keys to something like
700M per node. This seems to have caused a very large heap consumption.

After reading
http://wiki.apache.org/cassandra/LargeDataSetConsiderations I think I've
tracked this down to the bloom filter, and the sampled index entries.

Regarding bloom filters, have I understood correctly that they are
stored on Heap, and that the "Bloom Filter Space Used" reported by
'nodetool cfstats' is an approximation of the heap space used by bloom
filters? It reports the on-disk size, but if I understand
CASSANDRA-3497, the on-disk size is smaller than the on-Heap size?

I understand that increasing bloom_filter_fp_chance will decrease the
bloom filter size, but at the cost of worse performance when asking for
keys that don't exist. I do have a fair amount of queries for keys that
don't exist.

How much will increasing the key cache help, i.e. decrease bloom filter
size but increase key cache size? Will the key cache cache negative
results, i.e. the fact that a key didn't exist?


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupuser @
postedMar 21, '12 at 3:28p
activeMar 21, '12 at 5:45p

2 users in discussion

Erik Forsberg: 1 post Aaron morton: 1 post



site design / logo © 2022 Grokbase