FAQ
Hi guys

I want to set the param chunk_length_kb in order to improve the read
latency of my cassandra_stress's test.

This is the table

CREATE TABLE "Keyspace1".standard1 (
     key blob PRIMARY KEY,
     "C0" blob,
     "C1" blob,
     "C2" blob,
     "C3" blob,
     "C4" blob
) WITH bloom_filter_fp_chance = 0.1
     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
     AND comment = ''
     AND compaction = {'sstable_size_in_mb': '160', 'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
     AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.SnappyCompressor'}
     AND dclocal_read_repair_chance = 0.1
     AND default_time_to_live = 0
     AND gc_grace_seconds = 864000
     AND max_index_interval = 2048
     AND memtable_flush_period_in_ms = 0
     AND min_index_interval = 128
     AND read_repair_chance = 0.0
     AND speculative_retry = '99.0PERCENTILE';

I have 6 columns of type blob. This table is filled by cassandra_stres

admin@cqlsh:Keyspace1> select * from standard1 limit 2;

  key |
C0 |
C1 |
C2 |
C3 | C4
------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------
  0x4b343050393536353531 |
0xe0e3d68ed1536e4d994aa74860270ac91cf7941acb5eefd925815481298f0d558d4f |
0xa43f78202576f1ccbdf50657792fac06f0ca7c9416ee68a08125c8dce4dfd085131d |
0xab12b06bf64c73e708d1b96fea9badc678303906e3d5f5f96fae7d8092ee0df0c54c |
0x428a157cb598487a1b938bdb6c45b09fad3b6408fddc290a6b332b91426b00ddaeb2 |
0x0583038d881ab25be72155bc3aa5cb9ec3aab8e795601abe63a2b35f48ce1e359f5e

I am having a read latency of ~500 microseconds, I think it takes to much
time comparing to the write latency of ~30 microseconds.

My first clue is to fix the chunk_length_kb to a value close to the size
of the rows in kb

Am I in the right direction? If it is true, how can I compute the size of a
row?

Other question, the value of "Compacted partition" of the command nodetool
cfstats migth give me a value close to the chunk_length_kb ?

Best regards

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

Search Discussions

  • Ben Bromhead at Feb 17, 2016 at 5:53 pm
    You will need to experiment with chunk_length based on your dataset. At the
    end of the day its about finding the sweetspot as chunk_length needs to be
    big enough such that you can get a decent compression rate (large chunks
    increases the likelihood of a better compression ratio, which means you
    will read less from disk) but you also want it to be small so that you are
    not reading unrelated data from disk.

    But... before you go down the chunk_length testing rabbit hole. Make sure
    you are using a sane read_ahead value on the block device your data
    directory sits on. For example if you are on AWS and using a raid device
    built with mdadm the read_ahead value for the block device can be as high
    as 128kb by default. If you are on SSDs you can safely drop it to 8 or 16
    (or even 0) and see a big uptick in read performance.

    For lots of juicy low level disk tuning and further details see Al Tobey's
    guide https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
    On Fri, 29 Jan 2016 at 08:26 Jean Carlo wrote:

    Hi guys

    I want to set the param chunk_length_kb in order to improve the read
    latency of my cassandra_stress's test.

    This is the table

    CREATE TABLE "Keyspace1".standard1 (
    key blob PRIMARY KEY,
    "C0" blob,
    "C1" blob,
    "C2" blob,
    "C3" blob,
    "C4" blob
    ) WITH bloom_filter_fp_chance = 0.1
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'sstable_size_in_mb': '160', 'class':
    'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'sstable_compression':
    'org.apache.cassandra.io.compress.SnappyCompressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

    I have 6 columns of type blob. This table is filled by cassandra_stres

    admin@cqlsh:Keyspace1> select * from standard1 limit 2;

    key |
    C0 |
    C1 |
    C2 |
    C3 | C4

    ------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------
    0x4b343050393536353531 |
    0xe0e3d68ed1536e4d994aa74860270ac91cf7941acb5eefd925815481298f0d558d4f |
    0xa43f78202576f1ccbdf50657792fac06f0ca7c9416ee68a08125c8dce4dfd085131d |
    0xab12b06bf64c73e708d1b96fea9badc678303906e3d5f5f96fae7d8092ee0df0c54c |
    0x428a157cb598487a1b938bdb6c45b09fad3b6408fddc290a6b332b91426b00ddaeb2 |
    0x0583038d881ab25be72155bc3aa5cb9ec3aab8e795601abe63a2b35f48ce1e359f5e

    I am having a read latency of ~500 microseconds, I think it takes to much
    time comparing to the write latency of ~30 microseconds.

    My first clue is to fix the chunk_length_kb to a value close to the size
    of the rows in kb

    Am I in the right direction? If it is true, how can I compute the size of
    a row?

    Other question, the value of "Compacted partition" of the command nodetool
    cfstats migth give me a value close to the chunk_length_kb ?

    Best regards

    Jean Carlo

    "The best way to predict the future is to invent it" Alan Kay
    --
    Ben Bromhead
    CTO | Instaclustr <https://www.instaclustr.com/>
    +1 650 284 9692
    Managed Cassandra / Spark on AWS, Azure and Softlayer

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriescassandra
postedJan 29, '16 at 4:26p
activeFeb 17, '16 at 5:53p
posts2
users2
websitecassandra.apache.org
irc#cassandra

2 users in discussion

Jean Carlo: 1 post Ben Bromhead: 1 post

People

Translate

site design / logo © 2019 Grokbase