Connecting to Cassandra from application code and querying consumes Cassandra's CPU.
My query is like :: select fields from table where partition_key = "PARTITION_KEY" and clustering_key_1 = "KEY1" and clustering_key_2 in (a1, a2, a3..a100);
I am using in keyword on clustering columns only. But it still afffects the CPU badly. Sometimes the CPU reaches 100%.
Is this normal ?
No, 100% CPU usage is not normal for querying. But quite frankly, neither is querying for 100 entries with an
INclause.Even using
INon a clustering key forces Cassandra to perform random reads. Cassandra was built for reading sequentially. I would not recommend even double-digits ofINclause entries.Recommendations:
select fields from table where partition_key = "PARTITION_KEY" and clustering_key_1 = "KEY1" and clustering_key_2 >= 'a1' and clustering_key_2 <= 'a100');Usually, 100% CPU during querying means that the cluster needs more nodes. However, since the query is limited by partition, more nodes will not help. In that case, the partitions might be too large, and re-modeling the table to have smaller partitions will spread out the load on the cluster more evenly.
Edit 20200616
There are other factors that can make a query consume high amounts of CPU.
Are you querying columns that support in-place writes or lots of deletes? Both conditions will make Cassandra have to work harder, due to ignoring obsoleted and tombstoned data.
Try running an
iostat. If you're in a virtualized/cloud environment, you could be seeing "noisy neighbor" issues like CPU steal and high (disk) I/O wait times.