How should I store nested JSON object inside Cassandra?

Question

How should I store nested JSON object inside Cassandra?

261 Views Asked by best_of_man At 21 March 2023 at 15:44

This is the first time I am working with Cassandra and I have a data structure like following, want to save it inside the Cassandra:

{"user_id": "123",
 "user_cards": {
  "first_card": {
    "card_number": 456
  }
 }
}

I searched through the internet and found an example like following:

use json;
CREATE type json.sale ( id int, item text, amount int );
CREATE TABLE json.customers ( id int  PRIMARY KEY, name text, balance int, sales list> );

INSERT INTO json.customers (id, name, balance, sales) 
VALUES (123, 'Greenville Hardware', 700,
[{ id: 5544, item : 'tape', amount : 100},
{ id: 5545, item : 'wire', amount : 200}]) ;

But I am not sure if this is the best way of doing that? I remember from my experience with MySQL or MongoDB/Mongoose, we were defining separate tables for nested JSON objects and put the Foreign key of that table inside a column of the parent table(Or do population in Mongo).

As I heard Cassandra is against normalization and it's better to do denormalization on it, I don't know what approach should I take?

Also I have another question about data compression, Will cassandra take care of it if I do something like the example I put above?

Original Q&A

There are 2 best solutions below

**Aaron** · Answer 1 · 2023-03-22T14:45:17.147000

On the surface, I think this is a good approach. My only concern would be if each customer has many sales...like millions. But if the business use isn't going to yield much more than a few thousand or so, this is probably ok.

If not, it might make sense to add a date/time component to the partition key like year, or something:

PRIMARY KEY ((id, year_of_sale))

That'll make sure that sales will be capped per partition by year.

The other thing to consider, is the query pattern this needs to support. Right now, it only supports querying by id. But if that's ok, then you should be good-to-go!

**Erick Ramirez** · Answer 2 · 2023-03-31T07:54:17.017000

It is possible to insert data in JSON format using the INSERT ... JSON CQL command. For example:

INSERT INTO table_name JSON '{
    "column_name": "value"
}'

But it is a bit more nuanced than that so allow me to explain.

Data modeling in Cassandra is the complete opposite of data modeling in traditional relational databases. Instead of working out how to store the data into tables, we first start by listing all the application queries THEN designing a table for EACH of the app queries. We do this so the tables are optimised for reads.

For example, assuming the application needs to "retrieve all cards for a user", we need to design a table such that:

the data is partitioned by users, AND
the cards are "clustered" (grouped together).

The table schema would look something like:

CREATE TABLE cards_by_user (
    user_id int,
    card_number int,
    card_type text,
    card_expiry text,
    ...
    PRIMARY KEY (user_id, card_number)
) WITH CLUSTERING ORDER BY (card_number ASC)

In contrast to two-dimensional tables in RDBMS, this Cassandra table is multi-dimensional such that each partition (user) can have one or more rows (cards).

To create a new card entry for a user, the CQL statement to insert JSON formatted data looks like:

INSERT INTO cards_by_user
JSON '{
    "user_id": "123",
    "card_number": "456",
    "card_type": "visa",
    "card_expiry": "07/2028"
}'

You can insert multiple rows of cards using the same INSERT format. For example:

INSERT INTO cards_by_user
JSON '{
    "user_id": "123",
    "card_number": "789",
    "card_type": "mastercard",
    "card_expiry": "04/2025"
}'

To retrieve all the cards for a user:

SELECT * FROM cards_by_user WHERE user_id = 123;

 user_id | card_number | card_expiry | card_type
---------+-------------+-------------+------------
     123 |         456 |     07/2028 |       visa
     123 |         789 |     04/2025 | mastercard

As you can see, it is not necessary to use user-defined types (UDTs) to store the data. We recommend mapping the data to native CQL columns instead of UDTs whenever possible to reduce the level of complexity required to maintain your data and code.

As a side note, if you prefer to work with JSON documents then have a look at Stargate.io -- an open-source data API gateway which has a Document API which allows you to store and retrieve JSON documents similar to MongoDB.

Have a look at the free tutorial on datastax.com/dev where you can try it out in a hands-on lab. Cheers!

How should I store nested JSON object inside Cassandra?

There are 2 best solutions below

Related Questions in CASSANDRA

Related Questions in BLOB

Related Questions in DATA-MODELING

Related Questions in DENORMALIZATION

Related Questions in NESTED-JSON

Trending Questions

Popular # Hahtags

Popular Questions