Value Error - Recommender System with ALS Model

110 Views Asked by At

I have a database that I got online for the movies. The database has an ID (Just an interaction with the movie, ID does not mean anything), User, and MovieID. Each seperate row represent a given user watching a given movie, so I am trying to write a movie reccomendation system for each user. So you give me a user and I output the list of movies that they might like.

Here is the database (Almost 90,000 rows and a lot of different movies)

    ID  User    MovieID
0   17556   2591    88879
1   17557   3101    88879
2   17598   3101    88879
3   17598   3101    88879
4   17604   9459937 88879
... ... ... ...
88085   73266   9468430 9948
88086   73310   9467397 112749
88087   73371   9468018 109281
88088   73371   9468018 109281
88089   73381   9468360 109508

So I used internet and found the following code:

import implicit
from scipy.sparse import coo_matrix

# Drop any duplicate rows from the DataFrame
df = df.drop_duplicates(subset=["User", "MovieID"])

# Sort the DataFrame by the User column
df = df.sort_values("User")

# Create a pivot table with the User as the index, the MovieID as the columns, and the ID as the values
bookings = df.pivot_table(index="User", columns="MovieID", values="ID", aggfunc=len, fill_value=0)

# Convert the pivot table to a sparse matrix in the COOrdinate format
M = coo_matrix(bookings)

# Convert the sparse matrix to the CSR format
M = M.tocsr()

# Create an ALS model
model = implicit.als.AlternatingLeastSquares(factors=10)

# Fit the model to the data
model.fit(M)

def recommend_movies(user):
    # Make sure the user is in the index of the pivot table
    if user not in bookings.index:
        return []
    
    # Get the user index in the matrix
    user_index = list(bookings.index).index(user)

    # Get the recommendations for the user
    recommendations = model.recommend(user_index, M, N=10)

    # Get the movie IDs of the recommended movies
    recommended_movies = [bookings.columns[index] for index, _ in recommendations]

    return recommended_movies

# Example usage:
recommendations = recommend_movies(3101)

# Print the recommendations
print(recommendations)

But this error kep coming up on this line:

recommendations = recommend_movies(3101)

47             user_count = 1 if np.isscalar(userid) else len(userid)
     48             if user_items.shape[0] != user_count:
---> 49                 raise ValueError("user_items must contain 1 row for every user in userids")
     50 
     51         user = self._user_factor(userid, user_items, recalculate_user)

ValueError: user_items must contain 1 row for every user in userids

I tried using ChatGPT but it was not able to give me the solution and I also looked online and was not able to find anything. There are some duplicate User values and MovieID values in the dataset as can be seen, because users watch multiple movies and also sometimes rewatch the same movies

0

There are 0 best solutions below