I'm bulkloading via streaming from Hadoop to my Cassandra cluster. This
results in a rather large set of relatively small (~1MiB) sstables as
the number of mappers that generate sstables on the hadoop cluster is high.

With SizeTieredCompactionStrategy, the cassandra cluster would quickly
compact all these small sstables into decently sized sstables.

With LeveledCompactionStrategy however, it takes a much longer time. I
have multithreaded_compaction: true, but it is only taking on 32
sstables at a time in one single compaction task, so when it starts with
~1500 sstables, it takes quite some time. I'm not running out of I/O.

Is there some configuration knob I can tune to make this happen faster?
I'm getting a bit confused by the description for min_sstable_size,
bucket_high, bucket_low etc - and I'm not sure if they apply in this case.

I'm pondering options for decreasing the number of sstables being
streamed from the hadoop side, but if that is possible remains to be seen.


Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 3 | next ›
Discussion Overview
groupuser @
postedAug 18, '14 at 1:22p
activeAug 20, '14 at 7:23a

2 users in discussion

Erik Forsberg: 2 posts Robert Coli: 1 post



site design / logo © 2022 Grokbase