Pyspark cassandra connector generates tombstones during writing

31 Views Asked by danmo41 At 02 February 2024 at 20:58

I understand that when inserting data, tombstones might be created because of existing null values in the columns of the dataframe. To mitigate this issue and minimize tombstones, insertion queries should exclude columns with null values.

Currently, I'm working with the spark-cassandra-connector in pyspark-jupyter notebook environment and I've come across the "com.datastax.spark.connector.types.CassandraOption" trait for scala, How can I leverage this trait or any other method to address the tombstone problem?

Original Q&A

There are 1 best solutions below

Erick Ramirez On 05 February 2024 at 05:40 BEST ANSWER

WriteConf has a parameter ignoreNulls which you can set to true so that null values are not inserted when writing to Cassandra.

You can also configure the SparkConf object by setting the spark.cassandra.output.ignoreNulls to true.

For details, see the Globally treating all nulls as Unset section and the Configuration Reference in the docs. Cheers!

Pyspark cassandra connector generates tombstones during writing

There are 1 best solutions below

Related Questions in PYSPARK

Related Questions in CASSANDRA

Related Questions in SPARK-CASSANDRA-CONNECTOR

Trending Questions

Popular # Hahtags

Popular Questions