Save array of objects in cassandra

825 Views Asked by At

How can I save array of objects in cassandra?
I'm using a nodeJS application and using cassandra-driver to connect to Cassandra DB. I wanted to save records like below in my db:

{
"id" : "5f1811029c82a61da4a44c05",
"logs" : [ 
    {
        "conversationId" : "e9b55229-f20c-4453-9c18-a1f4442eb667",
        "source" : "source1",
        "destination" : "destination1",
        "url" : "https://asdasdas.com",
        "data" : "data1"
    }, 
    {
        "conversationId" : "e9b55229-f20c-4453-9c18-a1f4442eb667",
        "source" : "source2",
        "destination" : "destination2",
        "url" : "https://afdvfbwadvsffd.com",
        "data" : "data2"
    }
],
"conversationId" : "e9b55229-f20c-4453-9c18-a1f4442eb667"
}  

In the above record, I can use type "text" to save values of the columns "id" and "conversationId". But not sure how can I define the schema and save data for the field "logs".

2

There are 2 best solutions below

0
On

You have a few options depending on how you want to query this data.

The first is to stringify the json in logs field and save that to the database and then convert it back to JSON after querying the data.

The second option is similar to the first, but instead of stringifying the array, you store the data as a list in the database.

The third option is to define a new table for the logs with a primary key of the conversation and clustering keys for each element of the logs. This will allow you to lookup either by the full key or query by just the primary key and retrieve all the rows that match those criteria.

CREATE TABLE conversationlogs (
  conversationid uuid,
  logid timeuuid,
  ...
  PRIMARY KEY ((conversationid), logid));
0
On

With Cassandra, you'll want to store the data in the same way that you want to query it. As you mentioned querying by conversatonid, that's going to influence how the PRIMARY KEY definition should look. Given this, conversationid, should make a good partition key. As for the clustering columns, I had to make some guesses as to cardinality. So, sourceid looked like it could be used to uniquely identify a log entry within a conversation, so I went with that next.

I thought about using id as the final clustering column, but it looks like all entries with the same conversationid would also have the same id. It might be a good idea to give each entry its own unique identifier, to help ensure uniqueness:

{
    "uniqueid": "e53723ca-2ab5-441f-b360-c60eacc2c854",
    "conversationId" : "e9b55229-f20c-4453-9c18-a1f4442eb667",
    "source" : "source1",
    "destination" : "destination1",
    "url" : "https://asdasdas.com",
    "data" : "data1"
}, 

This makes the final table definition look like this:

CREATE TABLE conversationlogs (
  id TEXT,
  conversationid TEXT,
  uniqueid UUID,
  source TEXT,
  destination TEXT,
  url TEXT,
  data TEXT,
  PRIMARY KEY (conversationid,sourceid,uniqueid));