I understand that when inserting data, tombstones might be created because of existing null values in the columns of the dataframe. To mitigate this issue and minimize tombstones, insertion queries should exclude columns with null values.
Currently, I'm working with the spark-cassandra-connector in pyspark-jupyter notebook environment and I've come across the "com.datastax.spark.connector.types.CassandraOption" trait for scala, How can I leverage this trait or any other method to address the tombstone problem?
WriteConfhas a parameterignoreNullswhich you can set totrueso thatnullvalues are not inserted when writing to Cassandra.You can also configure the
SparkConfobject by setting thespark.cassandra.output.ignoreNullstotrue.For details, see the Globally treating all nulls as Unset section and the Configuration Reference in the docs. Cheers!