Querying on cassandra consumes CPU

302 Views Asked by At

Connecting to Cassandra from application code and querying consumes Cassandra's CPU.

My query is like :: select fields from table where partition_key = "PARTITION_KEY" and clustering_key_1 = "KEY1" and clustering_key_2 in (a1, a2, a3..a100);

I am using in keyword on clustering columns only. But it still afffects the CPU badly. Sometimes the CPU reaches 100%.

Is this normal ?

1

There are 1 best solutions below

3
Aaron On

No, 100% CPU usage is not normal for querying. But quite frankly, neither is querying for 100 entries with an IN clause.

Even using IN on a clustering key forces Cassandra to perform random reads. Cassandra was built for reading sequentially. I would not recommend even double-digits of IN clause entries.

Recommendations:

  • Try to keep the number of rows returned to a minimum. You may need to break this query up into ten or twenty smaller queries.
  • If you really just need 'a1' through 'a100', why not try it as a range query?

select fields from table where partition_key = "PARTITION_KEY" and clustering_key_1 = "KEY1" and clustering_key_2 >= 'a1' and clustering_key_2 <= 'a100');

Usually, 100% CPU during querying means that the cluster needs more nodes. However, since the query is limited by partition, more nodes will not help. In that case, the partitions might be too large, and re-modeling the table to have smaller partitions will spread out the load on the cluster more evenly.

Edit 20200616

There are other factors that can make a query consume high amounts of CPU.

Are you querying columns that support in-place writes or lots of deletes? Both conditions will make Cassandra have to work harder, due to ignoring obsoleted and tombstoned data.

Try running an iostat. If you're in a virtualized/cloud environment, you could be seeing "noisy neighbor" issues like CPU steal and high (disk) I/O wait times.