Does Spark from DSE laod all data into RDD before running SQL Query?

203 Views Asked by user432024 At 25 November 2025 at 19:43

Running DSE 4.7

So say I have a 4 node DSE Cassandra/Spark cluster...

I have a Cassandra table with say 4,000,000 records in it.

On Spark running the following Spark SQL "select * from table where email = ? or mobile = ?"

Will Spark load all the data into RDD and then filter based on the where clause? Will each spark node have 1,000,000 records per node loaded into memory?

Original Q&A

There are 1 best solutions below

shutty On 11 June 2015 at 16:10 BEST ANSWER

Will spark load all the data into RDD and then filter based on the where clause?

It depends on your database schema. If your query explicitly restricts scan to a single C* partition (and ours where email = ? or mobile = ? definitely does not), Spark will load only part of the data.

In your case it will have to scan all the data.

Will each spark node have 1,000,000 records per node loaded into memory?

Again, it depends of your dataset size and amount of RAM on worker nodes. Spark RDDs are not always fully loaded into RAM, in your case it can be split into smaller parts (e.g. 100k rows), loaded into ram, filtered according to your query and saved after that, one-by-one.

Does Spark from DSE laod all data into RDD before running SQL Query?

There are 1 best solutions below

Related Questions in CASSANDRA

Related Questions in APACHE-SPARK

Related Questions in DATASTAX

Trending Questions

Popular # Hahtags

Popular Questions