I am playing with Cassandra on out testing environment. I have 3 nodes of C* (version 3.11) in cluster and 3 column families with replication factor 2 and SizeTieredCompactionStrategy. Each node is deployed to AWS r4.large instance with two EBS gp2 disks - 100GB for commit log and 300GB for data. 2 column families are write-only, our applications write quite a lot to these tables (about 800 - 1000 inserts per second in total) and never read them. Third column family is a counter-table, which is used like 'Increment and Get' pattern: about 200 increments and gets per second. Each counter is updated not that frequently - about 30 times during the whole test.
Everything worked fine for a 5 days, but then read ops/sec on one of nodes stared to grow to >1000 iops, to the disk iops limit:
When it reached the limit, response time of selects from counter-table jumped from 2-digit milliseconds to seconds and read io started to grow on the second node.
Here is an output of nodetool cfstats
:
Keyspace : geo
Read Count: 51744711
Read Latency: 12.840126352082631 ms.
Write Count: 201887146
Write Latency: 0.02199966441647553 ms.
Pending Flushes: 0
Table: gpm
SSTable count: 5
Space used (live): 5127907030
Space used (total): 5127907030
Space used by snapshots (total): 0
Off heap memory used (total): 102252166
SSTable Compression Ratio: 0.41318162863841235
Number of keys (estimate): 32583794
Memtable cell count: 51001
Memtable data size: 3072328
Memtable off heap memory used: 0
Memtable switch count: 176
Local read count: 51530318
Local read latency: NaN ms
Local write count: 97595727
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 80436992
Bloom filter off heap memory used: 80436952
Index summary off heap memory used: 21046326
Compression metadata off heap memory used: 768888
Compacted partition minimum bytes: 87
Compacted partition maximum bytes: 149
Compacted partition mean bytes: 112
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
I can't find any explanation why is it happening. I performed the same test twice (starting from empty database) and both times it failed identically after 5-6 days from the beginning. Here is tracing of typical query which selects counter (the only type of selects performed during test):
Execute CQL3 query | 2017-08-28 16:06:44.110000 | 172.31.16.220 | 0 | 172.31.16.220
Parsing select * from geo.gpm where usr = '57469' and lat = 55.617 and lon = 37.509; [Native-Transport-Requests-1] | 2017-08-28 16:06:44.110000 | 172.31.16.220 | 187 | 172.31.16.220
Preparing statement [Native-Transport-Requests-1] | 2017-08-28 16:06:44.110000 | 172.31.16.220 | 411 | 172.31.16.220
reading data from /172.31.32.220 [Native-Transport-Requests-1] | 2017-08-28 16:06:44.110000 | 172.31.16.220 | 736 | 172.31.16.220
Sending READ message to cassandra-test-skywalker.aws.local/172.31.32.220 [MessagingService-Outgoing-cassandra-test-skywalker.aws.local/172.31.32.220-Small] | 2017-08-28 16:06:44.111000 | 172.31.16.220 | 1184 | 172.31.16.220
READ message received from /172.31.16.220 [MessagingService-Incoming-/172.31.16.220] | 2017-08-28 16:06:44.111000 | 172.31.32.220 | 35 | 172.31.16.220
Executing single-partition query on gpm [ReadStage-3] | 2017-08-28 16:06:44.117000 | 172.31.32.220 | 5675 | 172.31.16.220
Acquiring sstable references [ReadStage-3] | 2017-08-28 16:06:44.117000 | 172.31.32.220 | 5745 | 172.31.16.220
Skipped 0/5 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-3] | 2017-08-28 16:06:44.117000 | 172.31.32.220 | 5812 | 172.31.16.220
Key cache hit for sstable 85 [ReadStage-3] | 2017-08-28 16:06:44.117000 | 172.31.32.220 | 5906 | 172.31.16.220
Key cache hit for sstable 170 [ReadStage-3] | 2017-08-28 16:06:44.117000 | 172.31.32.220 | 5963 | 172.31.16.220
Key cache hit for sstable 191 [ReadStage-3] | 2017-08-28 16:06:44.117000 | 172.31.32.220 | 5996 | 172.31.16.220
Key cache hit for sstable 212 [ReadStage-3] | 2017-08-28 16:06:44.117001 | 172.31.32.220 | 6028 | 172.31.16.220
Bloom filter allows skipping sstable 217 [ReadStage-3] | 2017-08-28 16:06:44.118000 | 172.31.32.220 | 6069 | 172.31.16.220
Merged data from memtables and 4 sstables [ReadStage-3] | 2017-08-28 16:06:44.118000 | 172.31.32.220 | 6151 | 172.31.16.220
Read 1 live and 0 tombstone cells [ReadStage-3] | 2017-08-28 16:06:44.118000 | 172.31.32.220 | 6203 | 172.31.16.220
Enqueuing response to /172.31.16.220 [ReadStage-3] | 2017-08-28 16:06:44.118001 | 172.31.32.220 | 6228 | 172.31.16.220
REQUEST_RESPONSE message received from /172.31.32.220 [MessagingService-Incoming-/172.31.32.220] | 2017-08-28 16:06:44.119000 | 172.31.16.220 | 9891 | 172.31.16.220
Sending REQUEST_RESPONSE message to cassandra-test-kenobi.aws.local/172.31.16.220 [MessagingService-Outgoing-cassandra-test-kenobi.aws.local/172.31.16.220-Small] | 2017-08-28 16:06:44.119000 | 172.31.32.220 | 7466 | 172.31.16.220
Processing response from /172.31.32.220 [RequestResponseStage-3] | 2017-08-28 16:06:44.120000 | 172.31.16.220 | 10013 | 172.31.16.220
Request complete | 2017-08-28 16:06:44.124753 | 172.31.16.220 | 14753 | 172.31.16.220
Any help will be appreciated. Are there any profiling tools which can help me to realise what is wrong?