We are struggling with a problem that when adding nodes around 5% read
operations freeze (aka time out after 1 second) for a few seconds (10-20
seconds). It might not seems much, but at the order of 200k requests per
second that's quite big of disruption. It is well documented and known
that adding nodes *has* impact on the latency or the completion of the
requests but is there a way to lessen that?
It is completely okay for write operations to fail or get blocked while
adding nodes, but having the read path also impacted by this much (going
from 30 millisecond 99 percentile latency to above 1 second) is what
We have a 36 node cluster, every node owning ~120 GB of data. We are using
Cassandra version 2.0.14 with vnodes and we are in the process of
increasing capacity of the cluster, by roughly doubling the nodes. They
have SSDs and have peak IO usage of ~30%.
Apart from the latency metrics only FlushWrites are blocked 18% of the time
(based on the tpstats counters), but that can only lead to blocking writes
and not reads?