Introduction:
I am facing a challenge with deleting duplicated records in a Cassandra table using CQL. The table, myschema.transactions, has a composite primary key consisting of account_id, transaction_id, year, and txn_date. Despite fixing an insertion issue, some duplicate records remain due to differences in the transaction_id values.
Table Structure:
| account_id | transaction_id | year | txn_date | amount | type | currency |
|------------|--------------------------|------|------------|--------|---------|----------|
| 123456 | 20231129-002-123456-123 | 2023 | 2023-11-29 | 300 | payment | USD |
| 123456 | 20231129--123456-123 | 2023 | 2023-11-29 | 300 | payment | USD |
Issue:
I identified duplicate records where the all the other values are similar except for a missing segment for transaction_id. For example:
transaction_id 1: "20231129-002-123456-123"
transaction_id 2: "20231129--123456-123"
In this case, the 002 part is missing in the duplicated transaction, and I aim to delete this entry.
Objective:
I want to delete all duplicated records with missing segments in the transaction_id.
Research:
Despite thorough research, I couldn't find a suitable solution to address this specific scenario.
Request for Assistance:
Can anyone provide guidance on how to delete duplicated records in Cassandra with a composite primary key, considering the specific case of missing segments in the transaction_id?
Thank you in advance for your assistance!