Setting Heap size and other related parameters in cassandra

200 Views Asked by At

I have a cassandra 5 node cluster with 256GB memory. I am facing some performance issue on read operation so I decided to increase my heap size as it was using the defualt. I updated cassandra-env file with MAX_HEAP_SIZE="128G" & HEAP_NEWSIZE="32G".

I found a bit better performace for read query but I saw some messages like "Some operations were slow" and a Garbage Collection event in the logs.It seems that increasing the heap size might have led to increased garbage collection activity.

Could you please assist me to adjust other parameters as well with respect to MAX_HEAP_SIZE="128G".

2

There are 2 best solutions below

0
On

First, I would not change the parameters in cassandra-env.sh. Instead, use the jvm.options file.

Second, I would probably not move to 128G heap size, that's probably too large.

Third, the newsize and max heap size should be the same, otherwise, you'll get expansion and that could cause perf issues.

Fourth, you'll have to understand what's happening before you increase the heap size. Why increase the heap size? Are you seeing allocation errors because the heap is exhausted? Are you seeing long old gen GC pauses?

In jvm.options, set -Xmx and -Xms instead of mussing with cassandra-env.sh.

1
On

I really don't think that the heap settings are the problem here. Steve is right in that you definitely do not want to go to 128GB. In fact, I wouldn't recommend going above 32GB w/ Java 8. He's also correct in that heap config changes with Cassandra 4 should be made in the jvm.options file, and not cassandra-env.sh.

The problem is this:

select col1,col2,col3..col75 FROM keyuspace.table
WHERE "ID" in (65893388252433)
AND "EndTime" >= 1688511600000 AND "EndTime" <= 1688597999999
LIMIT 20000 ;

We have total 200 columns in table, it is taking around 45 secods for fatching 20k records. if we select multiple IDs inside the "in clause" , it takes lots of time and sometime hangs.

Judging by the SELECT statement above, I'm going to guess that the PRIMARY KEY definition looks something like this: PRIMARY KEY (("ID"),"EndTime). It also looks to me like you're just trying to pull too much data back at once. I'd recommend the following:

  • Only query for one "ID" at a time.
  • Be judicious about the number of columns specified in the SELECT.
  • Decrease the time window. It looks like this query is for 24 hours; see if you can lower that.
  • Remodel the table to use "hour" as a partition bucket; ex: PRIMARY KEY (("ID","hour_bucket"),"EndTime"). This will result in smaller partitions, which also looks like some of the problem.
  • Use solid state drives. Not sure of the disk hardware behind this cluster, but I'd be willing to bet that disk IO latency is too high. If the cluster isn't backed by solid state drives, this is where the biggest bang-for-buck will be found. If the cluster is in the cloud, look at migrating to the next disk tier. If the cluster disks are abstracted by a NetApp or some other disk array device, check to make sure that the disks for the nodes in the cluster aren't being placed in the same hardware array. It's quite possible that the query is hitting a disk bottleneck that can't be affected by Cassandra configs.