Cassandra, optimizing the in clause

52 Views Asked by At

I was thinking about a way of optimizing in-clause like id in (1,2,3,....)?

  • Get a hold of Murmur3Partitioner hashing function
  • group in-clause by values that result in the same hash like "id in (x1, x3, ...)" with x1 and x3 having the same hash.
  • passing that query to driver, the driver should be able to go to the partition-owner node?

So, how do I get a hold of Cassandra's Murmur3Partitioner hashing function so that I can calculate the hash in my code?

Is this theory going to work with Cassandra?

1

There are 1 best solutions below

0
On

Drivers already do this if using token aware load balancing policy. Worth noting its not likely you will have multiple ids with same token, although they may be the same coordinators.

In general its a bad idea to try to batch up requests like this. Unless you have an unusual scenario its better to just use executeAsync on each id and do a get on all of them. It will distribute and parallelize the coordination load across the cluster better and require less custom work. I highly recommend not prematurely optimizing and instead focus on having a correct data model. If you need to do work in batch use the spark loader/reader or look at it for a good example on doing it efficiently.