IMDBpy - Get Genres from the Top 20 movies

2.7k Views Asked by At

I'm trying to extracting a dataset with the top 20 movies and each genres and actors. For that I'm trying with the following code:

top250 = ia.get_top250_movies()
limit = 20;
index = 0;
output = []
for item in top250:
    for genre in top250['genres']:
        index += 1;
        if index <= limit:
            print(item['long imdb canonical title'], ": ", genre);
        else:
            break;

I'm getting the following error:

Traceback (most recent call last):
  File "C:/Users/avilares/PycharmProjects/IMDB/IMDB.py", line 21, in <module>
    for genre in top250['genres']:
TypeError: list indices must be integers or slices, not str

I think the object top250 don't have the content genres...

Anyone know how to identify each genre of each movies?

Many thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

From the IMDbPY docs:

"It’s possible to retrieve the list of top 250 and bottom 100 movies:"

>>> top = ia.get_top250_movies()
>>> top[0]
<Movie id:0111161[http] title:_The Shawshank Redemption (1994)_>
>>> bottom = ia.get_bottom100_movies()
>>> bottom[0]
<Movie id:4458206[http] title:_Code Name: K.O.Z. (2015)_>

get_top_250_movies() returns a list, thus you can't access the movie's genre directly.

Here's a solution:

# Iterate through the movies in the top 250
for topmovie in top250:
    # First, retrieve the movie object using its ID
    movie = ia.get_movie(topmovie.movieID)
    # Print the movie's genres
    for genre in movie['genres']:
        print(genre)  

Full working code:

import imdb

ia = imdb.IMDb()
top250 = ia.get_top250_movies()

# Iterate through the first 20 movies in the top 250
for movie_count in range(0, 20):
    # First, retrieve the movie object using its ID
    movie = ia.get_movie(top250[movie_count].movieID)
    # Print movie title and genres
    print(movie['title'])
    print(*movie['genres'], sep=", ")

Output:

The Shawshank Redemption
Drama
The Godfather
Crime, Drama
The Godfather: Part II
Crime, Drama
The Dark Knight
Action, Crime, Drama, Thriller
12 Angry Men
Crime, Drama
Schindler's List
Biography, Drama, History
The Lord of the Rings: The Return of the King
Action, Adventure, Drama, Fantasy
Pulp Fiction
Crime, Drama
The Good, the Bad and the Ugly
Western
Fight Club
Drama
The Lord of the Rings: The Fellowship of the Ring
Adventure, Drama, Fantasy
Forrest Gump
Drama, Romance
Star Wars: Episode V - The Empire Strikes Back
Action, Adventure, Fantasy, Sci-Fi
Inception
Action, Adventure, Sci-Fi, Thriller
The Lord of the Rings: The Two Towers
Adventure, Drama, Fantasy
One Flew Over the Cuckoo's Nest
Drama
Goodfellas
Crime, Drama
The Matrix
Action, Sci-Fi
Seven Samurai
Adventure, Drama
City of God
Crime, Drama
2
On

Here is a shorter Pythonic code, the notebook can be accessed here.

Python provides some cleaner way to comprehend our code. In this script, I have used two of such techniques.

Technique-1: List comprehension

A list comprehension is nothing but looping through an iterable and producing a list as an output. Here we can include computation and conditionals also. The other technique i.e. Technique-2: Dictionary comprehension which is very similar to this, you can read about it here.

E.g. Code without list comprehension

numbers = []
for i in range(10):
  numbers.append(i)
print(numbers)

#Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Code using list comprehension

numbers = [i for i in range(10)]
print(numbers)

#Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Coming to OPs problem, the get_top250_movies() function returns a list of movies with very few details. The exact parameters it returns can be checked like this. As seen in the output the movie details do not contain genres and other details.

from imdb import IMDb
ia = IMDb()
top250Movies = ia.get_top250_movies()
top250Movies[0].items()

#output:
[('rating', 9.2),
 ('title', 'The Shawshank Redemption'),
 ('year', 1994),
 ('votes', 2222548),
 ('top 250 rank', 1),
 ('kind', 'movie'),
 ('canonical title', 'Shawshank Redemption, The'),
 ('long imdb title', 'The Shawshank Redemption (1994)'),
 ('long imdb canonical title', 'Shawshank Redemption, The (1994)'),
 ('smart canonical title', 'Shawshank Redemption, The'),
 ('smart long imdb canonical title', 'Shawshank Redemption, The (1994)')]

However, the get_movie() function returns a lot more information about a movie including the Genres.

We combine the two functions to get the genres of the top 20 movies. First we call the get_top250_movies() which returns a list of top 250 movies with fewer details (we are only interested in getting the movieID). Then we call the get_movie() for each movieID from the top movies list and this returns us the Genres.

Program:

from imdb import IMDb    

#initialize and get top 250 movies; this list of movies returned only has 
#fewer details and doesn't have genres
ia = IMDb()
top250Movies = ia.get_top250_movies()

#TECHNIQUE-1: List comprehension
#get top 20 Movies this way which returns lot of details including genres
top20Movies = [ia.get_movie(movie.movieID) for movie in top250Movies[:20]]

#TECHNIQUE-2: Dictionary comprehension
#expected output as a dictionary of movie titles: movie genres
{movie['title']:movie['genres'] for movie in top20Movies}

Output:

{'12 Angry Men': ['Drama'],
 'Fight Club': ['Drama'],
 'Forrest Gump': ['Drama', 'Romance'],
 'Goodfellas': ['Biography', 'Crime', 'Drama'],
 'Inception': ['Action', 'Adventure', 'Sci-Fi', 'Thriller'],
 "One Flew Over the Cuckoo's Nest": ['Drama'],
 'Pulp Fiction': ['Crime', 'Drama'],
 "Schindler's List": ['Biography', 'Drama', 'History'],
 'Se7en': ['Crime', 'Drama', 'Mystery', 'Thriller'],
 'Seven Samurai': ['Action', 'Adventure', 'Drama'],
 'Star Wars: Episode V - The Empire Strikes Back': ['Action',
  'Adventure',
  'Fantasy',
  'Sci-Fi'],
 'The Dark Knight': ['Action', 'Crime', 'Drama', 'Thriller'],
 'The Godfather': ['Crime', 'Drama'],
 'The Godfather: Part II': ['Crime', 'Drama'],
 'The Good, the Bad and the Ugly': ['Western'],
 'The Lord of the Rings: The Fellowship of the Ring': ['Action',
  'Adventure',
  'Drama',
  'Fantasy'],
 'The Lord of the Rings: The Return of the King': ['Adventure',
  'Drama',
  'Fantasy'],
 'The Lord of the Rings: The Two Towers': ['Adventure', 'Drama', 'Fantasy'],
 'The Matrix': ['Action', 'Sci-Fi'],
 'The Shawshank Redemption': ['Drama']}