In our production server we are seeing higher p99. repaird are running for 3 weeks still only 85% repaired. This is caused multiple reasons. One of them is - Cassandra LIMIT 1 is not optimized.
But today I want to discuss about access patterns. In last 12 hours
| HTTP RESPONSE Status | No. Of requests |
|---|---|
| 200 | 61041189 |
| 404 | 7971055 |
About ~12% read are for partitions which doesn't exist yet, weird legacy logic which is very difficult to change immediately.
Currently Cluster Settings
Compaction Strategy: Size
bloom_filter_fp_chance=0.01
nodetool cfstats
Bloom filter false positives: 204164614
Bloom filter false ratio: 0.00844
Bloom filter space used: 471339624
Question
Does it make sense to change the bloom filter to .001?
Your false positive ratio looks fine, or at least it's on par with the configuration - it's configured to have false positives around 1.00% of the time, whereas the actual ratio is 0.84%
The total of 204 164 614 false positives that you see in
cfstatsmay look like a large number, but represents the number of false positives out of about 25 500 000 000 bloom filter checks, and should only be analyzed in relation to that total, not by itself.You can still decrease the false positive chance, but it may not be worth it. If the sstables for the table are small enough (>10GB at most), even when the reads pass the bloom filter check, the overhead of the false positive reads should be negligible. If you have sstables in the order of 100s of GB or TB, then the overhead may justify a retune of the BF false positive chance.
If you do decrease the false positive chance, it comes at a twofold cost:
The short answer is that you can decrease the false positive chance if you can afford the storage and memory costs.
Nonetheless, if the goal is to improve read performance, typically bloom filter is not the main culprit - I would also look into other factors such as:
nodetool cfstats, as large volumes of tombstones on a single scan often cause timeouts or high latencies.