Set Partition Key Column Size

139 Views Asked by At

I have a Recent History table. It should only care about the last 100 wide rows inserted within its partition (user_id).

If you were to insert records after 100 then we should expect it to overwrite the oldest data.

My table is something like

        CREATE TABLE IF NOT EXISTS user_history (
            user_id text, 
            ts timeuuid,
            history_data text,              
            PRIMARY KEY (user_id, ts)
        )
        WITH CLUSTERING ORDER BY (ts DESC);

I would love to have Cassandra take care of this for me. Perhaps set a partition column size which loops around after limit? Can Cassandra do this?

Which approach would you recommend?

The idea I can think of right now is: 1) Use limit 100 on select and then call delete on records outside the 100 range. Mind you this seems tedious..

1

There are 1 best solutions below

1
On

No way to roll up.

Just an idea, you could use bucketing for the purpose:

    CREATE TABLE IF NOT EXISTS user_history (
        user_id text, 
        bucket int,
        ts timeuuid,
        history_data text,              
        PRIMARY KEY ((user_id, bucket), ts)
    )
    WITH CLUSTERING ORDER BY (ts DESC);

This is a reverse time-series where you can partition based on some custom bucket id. The idea is to have N buckets and that you have at most 100 items in each.

This requires a bit of application coding, however:

  • You need to track how many items there are into the last bucket
  • When your last bucket contains 100 items you increment your "current" bucket
  • You need to track the "current" bucket number
  • When you query you fetch at most 100 items from the "current" bucket, but if you fetch less than 100 items then you fetch the remaining from the "current-1" bucket.
  • When you delete your old data, (all buckets less than the current bucket - 2) you delete at partition level