Say I have the following Cassandra table:
CREATE TABLE experiment(
bin text,
x double,
y double,
PRIMARY KEY ((bin), x)
);
And I want to get the y values where x is greater than 10. Which one of the following two approaches would be the most efficient?
Enabling allow filtering:
SELECT x, y FROM experiment WHERE x > 10 ALLOW FILTERING;
Omitting the need for allow filtering by first getting the list of partition keys and subsequently running multiple concurrent queries, one per partition.
SELECT distinct bin FROM experiment; SELECT x, y FROM experiment WHERE bin = <bin> AND x > 10; //for each bin
Is there an alternative approach possible (with altered table design perhaps) that is more efficient?
Thanks in advance!
edit as an answer to a comment:
This is a constructed example. The real data will be a list of various sensor measurements:
time-stamp, subject-id, temperature, noise ...
The query we want to make is give me noise levels (over all measurements, on all subjects) where temperature is below 10 degrees. I don't mind switching to a different table design with a different partition strategy if that would benefit the query.