Deleting Duplicated Records in Cassandra Table Using CQL with Composite Primary Key

44 Views Asked by At

Introduction:

I am facing a challenge with deleting duplicated records in a Cassandra table using CQL. The table, myschema.transactions, has a composite primary key consisting of account_id, transaction_id, year, and txn_date. Despite fixing an insertion issue, some duplicate records remain due to differences in the transaction_id values.

Table Structure:

| account_id | transaction_id           | year | txn_date   | amount | type    | currency |
|------------|--------------------------|------|------------|--------|---------|----------|
| 123456     | 20231129-002-123456-123  | 2023 | 2023-11-29 | 300    | payment | USD      |
| 123456     | 20231129--123456-123     | 2023 | 2023-11-29 | 300    | payment | USD      |

Issue:

I identified duplicate records where the all the other values are similar except for a missing segment for transaction_id. For example:

transaction_id 1: "20231129-002-123456-123"
transaction_id 2: "20231129--123456-123"

In this case, the 002 part is missing in the duplicated transaction, and I aim to delete this entry.

Objective:

I want to delete all duplicated records with missing segments in the transaction_id.

Research:

Despite thorough research, I couldn't find a suitable solution to address this specific scenario.

Request for Assistance:

Can anyone provide guidance on how to delete duplicated records in Cassandra with a composite primary key, considering the specific case of missing segments in the transaction_id?

Thank you in advance for your assistance!

0

There are 0 best solutions below