I have a database that I got online for the movies. The database has an ID (Just an interaction with the movie, ID does not mean anything), User, and MovieID. Each seperate row represent a given user watching a given movie, so I am trying to write a movie reccomendation system for each user. So you give me a user and I output the list of movies that they might like.
Here is the database (Almost 90,000 rows and a lot of different movies)
ID User MovieID
0 17556 2591 88879
1 17557 3101 88879
2 17598 3101 88879
3 17598 3101 88879
4 17604 9459937 88879
... ... ... ...
88085 73266 9468430 9948
88086 73310 9467397 112749
88087 73371 9468018 109281
88088 73371 9468018 109281
88089 73381 9468360 109508
So I used internet and found the following code:
import implicit
from scipy.sparse import coo_matrix
# Drop any duplicate rows from the DataFrame
df = df.drop_duplicates(subset=["User", "MovieID"])
# Sort the DataFrame by the User column
df = df.sort_values("User")
# Create a pivot table with the User as the index, the MovieID as the columns, and the ID as the values
bookings = df.pivot_table(index="User", columns="MovieID", values="ID", aggfunc=len, fill_value=0)
# Convert the pivot table to a sparse matrix in the COOrdinate format
M = coo_matrix(bookings)
# Convert the sparse matrix to the CSR format
M = M.tocsr()
# Create an ALS model
model = implicit.als.AlternatingLeastSquares(factors=10)
# Fit the model to the data
model.fit(M)
def recommend_movies(user):
# Make sure the user is in the index of the pivot table
if user not in bookings.index:
return []
# Get the user index in the matrix
user_index = list(bookings.index).index(user)
# Get the recommendations for the user
recommendations = model.recommend(user_index, M, N=10)
# Get the movie IDs of the recommended movies
recommended_movies = [bookings.columns[index] for index, _ in recommendations]
return recommended_movies
# Example usage:
recommendations = recommend_movies(3101)
# Print the recommendations
print(recommendations)
But this error kep coming up on this line:
recommendations = recommend_movies(3101)
47 user_count = 1 if np.isscalar(userid) else len(userid)
48 if user_items.shape[0] != user_count:
---> 49 raise ValueError("user_items must contain 1 row for every user in userids")
50
51 user = self._user_factor(userid, user_items, recalculate_user)
ValueError: user_items must contain 1 row for every user in userids
I tried using ChatGPT but it was not able to give me the solution and I also looked online and was not able to find anything. There are some duplicate User values and MovieID values in the dataset as can be seen, because users watch multiple movies and also sometimes rewatch the same movies