Recently I saw some strange behavior on one of the nodes of a 3-node
cluster. A while ago I created a table and put some data (about 150M) in it
for testing. A few days ago I started to import full data into that table
using normal cql INSERT statements. As soon as inserting started, one node
started non-stop full GC. The other two nodes were totally fine. I stopped
the inserting process, restarted C* on all the nodes. All nodes are fine.
But once I started inserting again, full GC kicked in on that node within a
minute.The insertion speed is moderate. Again, the other two nodes were
fine. I tried this process a couple of times. Every time the same node
jumped into full GC. I even rebooted all the boxes. I checked system.log
but found no errors or warnings before full GC started.
Finally I deleted and recreated the table. All of sudden the problem went
away. The only thing I can think of is that table was created using STCS.
After I inserted 150M data into it, I switched it to LCS. Then I ran
incremental repair a couple of times. I saw validation and normal
compaction on that table as expected. When I recreated the table, I created
it with LCS.
I don't have the problem any more but just want to share the experience.
Maybe someone has an theory on this? BTW I am running C* 2.2.4 with CentOS
7 and Java 8. All boxes have the identical configurations.