Is it possible to estimate the percentage of tombstones in a table?

88 Views Asked by At

I am currently working on a script to calculate percentage of tombstones in target table, based on the threshold percentage, I want to provide user option to run compaction. So is it possible to estimate the percentage of tombstone on a table and how is that possible ? During my research I found system.compaction_history table with table_name, bytes_in and bytes_out columns with which I can make decisions but not sure my logic is correct or not.

Thanks for help in advance.

Regards, Jay

2

There are 2 best solutions below

0
stevenlacerda On BEST ANSWER

Use sstablemetadata instead. You'll get a value for the column:

Estimated droppable tombstones: 0.9188263888888889

But, you have to be careful with this because it's not a percentage of droppable. It's an estimation of the ratio of droppable tombstones to non-droppable columns within an SSTable. It indicates the proportion of tombstones that can be removed during compaction, relative to the total number of non-droppable columns present.

0
Erick Ramirez On

Determining the percentage tombstones requires a full table scan which is a terrible idea in Cassandra since such operation doesn't scale.

SSTable tools such as sstablemetadata can provide an estimate ratio of droppable tombstones but it's not the same as getting the number of tombstones. In any case, the ratio it provides is a very rough estimate since the algorithm used to calculate the ratio uses estimated column counts, not the actual number of columns in the partitions or row as I've explained in my response to this question on DBA Stack Exchange.

More importantly, the answer to the question you didn't ask is that it is never a good idea to manually trigger a major compaction. In most cases, it will cause more problems than you're trying to solve.

I've explained this in my post on why major compactions are a bad idea. What you need to do instead is address the underlying root cause. Cheers!