How do I model data containing movie titles to retrieve a list of movies by genre?

73 Views Asked by vrnvlk At 11 April 2023 at 09:25

I have a question regarding data modeling in cassandra. I want to create a cassandra database with movies, actors, ratings etc. I already defined what my tables should look like and I want to create tables like movies_by_genre, movies_by_year, actors_by_movie, movies_by_actor and so on. My movie data looks like this for example:

ID	title	year	genres
1	Top Gun: Maverick	2022	action, drama, adventure
2	Black Panther: Wakanda Forever	2019	action, sci-fi, superhero
3	Thor: Love and Thunder	2015	comedy, action, superhero, sci-fi
4	The Hobbit	2012	fantasy, roman
5	Mary Poppins	1964	children, musical

So when I want to create the table movies_by_genre I want the genre as partition key and for genre "Action" an output like this for example:

genre	title	year
action	Top Gun: Maverick	2022
action	Black Panther: Wakanda Forever	2019
action	Thor: Love and Thunder	2015

So the genre is part of a list as you can see in the first table so every movie can have several genres. How do I have to define my data input for my tables to get the result I require? Do I have to make a single dataset for every genre a movie belongs to?

I'm glad if you can help me!

Original Q&A

There are 1 best solutions below

Erick Ramirez On 12 April 2023 at 12:51

If your application needs to retrieve movies by genre then you should design a table that is (1) partitioned by genre and (2) rows clustered by the movie title.

For example, here's a simple example table that is designed for that specific app query:

CREATE TABLE movies_by_genre (
    genre text,
    title text,
    release_year int,
    duration int,
    director text,
    ...
    PRIMARY KEY (genre, title)
)

With this PRIMARY KEY (genre, title) definition, each genre will have one or more rows of movies.

Here's an example query which retrieves 5 action films:

SELECT genre, title FROM movies_by_genre
  WHERE genre = 'action'
  LIMIT 5

And a sample output:

 genre  | title
--------|---------------------------------------------
 action | Black Panther: Wakanda Forever
 action | Doctor Strange in the Multiverse of Madness
 action | The Batman
 action | Thor: Love and Thunder
 action | Top Gun: Maverick

How do I model data containing movie titles to retrieve a list of movies by genre?

There are 1 best solutions below

Related Questions in CASSANDRA

Related Questions in DATA-MODELING

Related Questions in DENORMALIZATION

Trending Questions

Popular # Hahtags

Popular Questions