Can you show the output of a tpstats on one of the effected nodes? That
will give some indication where the trouble might be.

On Tue, Apr 19, 2016 at 6:54 AM, sai krishnam raju potturi wrote:

do we see any hung process like Repairs on those 3 nodes? what does
"nodetool netstats" show??

On Tue, Apr 19, 2016 at 8:24 AM, Erik Forsberg wrote:


I have this problem where 3 of my 84 nodes misbehave with too long GC
times, leading to them being marked as DN.

This happens when I load data to them using CQL from a hadoop job, so
quite a lot of inserts at a time. The CQL loading job is using
TokenAwarePolicy with fallback to DCAwareRoundRobinPolicy. Cassandra java
driver version is in use.

My other observation is that around the time the GC starts to work like
crazy, there is a lot of outbound network traffic from the troublesome
nodes. If a healthy node has around 25 Mbit/s in, 25 Mbit/s out, an
unhealthy sees 25 Mbit/s in, 200 Mbit/s out.

So, something is iffy with these 3 nodes, but I have some trouble finding
out exactly what makes them differ.

This is Cassandra 2.0.13 (yes, old) using vnodes. Keyspace is using
NetworkTopologyStrategy with replication 2, in one datacenter.

One thing I know I'm doing wrong is that I have slightly differing number
of hosts in each of my 6 chassies (One of them have 15 nodes, one of have
13, the remaining have 14). Could what I'm seeing here be the effect of

Other ideas on what could be wrong? Some kind of vnode imbalance? How can
I diagnose that? What metrics should I be looking at?


Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 4 | next ›
Discussion Overview
groupuser @
postedApr 19, '16 at 12:24p
activeApr 21, '16 at 12:21p



site design / logo © 2022 Grokbase