Hi!
We're using the bulkloader to load data to Cassandra. During and after
bulkloading, the minor compaction process seems to result in larger
sstables being created. An example:
INFO [CompactionExecutor:105] 2012-03-21 15:18:46,608
CompactionTask.java (line 115) Compacting [SSTableReader(pat
h='/cassandra/OSP5/Data/OSP5-Data-hc-1755-Data.db'), (REMOVED A BUNCH OF
OTHER SSTABLE PATHS),
SSTableReader(path='/cassandra/OSP5/Data/OSP5-Data-hc-1749-Data.db'),
SSTableReader(path='/cassandra/O
SP5/Data/OSP5-Data-hc-1753-Data.db')]
INFO [CompactionExecutor:105] 2012-03-21 15:30:04,188
CompactionTask.java (line 226) Compacted to
[/cassandra/OSP5/Data/OSP5-Data-hc-3270-Data.db,]. 84,214,484 to
105,498,673 (~125% of original) bytes for 2,132,056 keys at
0.148486MB/s. Time: 677,580ms.
The sstables are compressed (DeflateCompressor with chunk size 128) on
the Hadoop cluster before being transferred to Cassandra, and the CF has
the same compression settings:
[default@Keyspace1] describe Data;
ColumnFamily: Data (Super)
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator:
org.apache.cassandra.db.marshal.LongType
Columns sorted by:
org.apache.cassandra.db.marshal.LongType/org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
DC Local Read repair chance: 0.0
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
chunk_length_kb: 128
sstable_compression:
org.apache.cassandra.io.compress.DeflateCompressor
Any clues on this?
Regards,
\EF