I¹m trying to estimate our disk space requirements and I¹m wondering about
disk space required for compaction.
My application mostly inserts new data and performs updates to existing data
very infrequently, so there will be very few bytes removed by compaction. It
seems that if a major compaction occurs, that performing the compaction will
require as much disk space as is currently consumed by the table.
So here¹s my question. If Cassandra only compacts one table at a time, then
I should be safe if I keep as much free space as there is data in the
largest table. If Cassandra can compact multiple tables simultaneously, then
it seems that I need as much free space as all the tables put together,
which means no more than 50% utilization. So, how much free space do I need?
Any rules of thumb anyone can offer?
Also, what happens if a node gets low on disk space and there isn¹t enough
available for compaction? If I add new nodes to reduce the amount of data on
each node, I assume the space won¹t be reclaimed until a compaction event
occurs. Is there a way to salvage a node that gets into a state where it
cannot compact its tables?