Cassandra, optimizing the in clause

57 Views Asked by Behzad Pirvali At 07 June 2025 at 06:46

I was thinking about a way of optimizing in-clause like id in (1,2,3,....)?

Get a hold of Murmur3Partitioner hashing function
group in-clause by values that result in the same hash like "id in (x1, x3, ...)" with x1 and x3 having the same hash.
passing that query to driver, the driver should be able to go to the partition-owner node?

So, how do I get a hold of Cassandra's Murmur3Partitioner hashing function so that I can calculate the hash in my code?

Is this theory going to work with Cassandra?

Original Q&A

There are 1 best solutions below

Chris Lohfink On 03 October 2018 at 01:08

Drivers already do this if using token aware load balancing policy. Worth noting its not likely you will have multiple ids with same token, although they may be the same coordinators.

In general its a bad idea to try to batch up requests like this. Unless you have an unusual scenario its better to just use executeAsync on each id and do a get on all of them. It will distribute and parallelize the coordination load across the cluster better and require less custom work. I highly recommend not prematurely optimizing and instead focus on having a correct data model. If you need to do work in batch use the spark loader/reader or look at it for a good example on doing it efficiently.

Cassandra, optimizing the in clause

There are 1 best solutions below

Related Questions in HASH

Related Questions in CASSANDRA

Related Questions in PARTITIONING

Related Questions in MURMURHASH

Trending Questions

Popular # Hahtags

Popular Questions