How to load CSV file instead of built in dataset in "Surprise" Python recommender system?

1.7k Views Asked by At

I don't know how to write a code to load a CSV file or .inter file instead of the built in dataset in this example of evaluating a dataset as a recommender system:

from surprise import SVD
from surprise import KNNBasic
from surprise import Dataset
from surprise.model_selection import cross_validate

# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

# Use the famous SVD algorithm.
algo = KNNBasic()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

How would the full line of code be where I only need to input datapath and filename? I have tried the website for Surprise, but I didn't find anything. So I don't want the movielens code in the example, but instead a line that loads a datapath and file.

1

There are 1 best solutions below

0
On

At first you need to create instance of Reader():

reader = Reader(line_format=u'rating user item', sep=',', rating_scale=(1, 6), skip_lines=1)

Note that line_format parameter can be only 'rating user item' (optionally 'timestamp' may be added) and these parameters has nothing to do with names of columns in your custom_rating.csv. Thats why skip_lines=1 prameter is defined (it skips first line in your csv file where usually column names are defined). On the other hand line_format parameter determines the order of columns. So just to be clear my custom_ratings.csv looks like this:

rating,userId,movieId
4,1,1
6,1,2
1,1,3
. . .
. . .
. . .

Now you can create your data instance:

data = Dataset.load_from_file("custom_rating.csv", reader=reader)

Finally you can proceed with creating SVD model as shown in examples:

# sample random trainset and testset
# test set is made of 20% of the ratings.
trainset, testset = train_test_split(data, test_size=.2)

# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

PS: And also don't forget to import libraries at the beginning of your code :)

from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from surprise.model_selection import train_test_split