I have a question regarding data modeling in cassandra. I want to create a cassandra database with movies, actors, ratings etc. I already defined what my tables should look like and I want to create tables like movies_by_genre, movies_by_year, actors_by_movie, movies_by_actor and so on. My movie data looks like this for example:
| ID | title | year | genres |
|---|---|---|---|
| 1 | Top Gun: Maverick | 2022 | action, drama, adventure |
| 2 | Black Panther: Wakanda Forever | 2019 | action, sci-fi, superhero |
| 3 | Thor: Love and Thunder | 2015 | comedy, action, superhero, sci-fi |
| 4 | The Hobbit | 2012 | fantasy, roman |
| 5 | Mary Poppins | 1964 | children, musical |
So when I want to create the table movies_by_genre I want the genre as partition key and for genre "Action" an output like this for example:
| genre | title | year |
|---|---|---|
| action | Top Gun: Maverick | 2022 |
| action | Black Panther: Wakanda Forever | 2019 |
| action | Thor: Love and Thunder | 2015 |
So the genre is part of a list as you can see in the first table so every movie can have several genres. How do I have to define my data input for my tables to get the result I require? Do I have to make a single dataset for every genre a movie belongs to?
I'm glad if you can help me!
If your application needs to retrieve movies by genre then you should design a table that is (1) partitioned by genre and (2) rows clustered by the movie title.
For example, here's a simple example table that is designed for that specific app query:
With this
PRIMARY KEY (genre, title)definition, each genre will have one or more rows of movies.Here's an example query which retrieves 5 action films:
And a sample output: