How to handle new data for recommendation system?

353 Views Asked by At

Here's a theoretical question. Let's assume that I have implemented two types of collaborative filtering: user-based CF and item-based CF (in the form of Slope One).

I have a nice data set for these algorithms to run on. But then I want to do two things:

  1. I'd like to add a new rating to the data set.
  2. I'd like to edit an existing rating.

How should my algorithms handle these changes (without doing a lot of unnecessary work)? Can anyone help me with that?

2

There are 2 best solutions below

0
On

For both cases, the strategy is very similar:

user-based CF:

  • update all similarities for the affected user (that is, one row and one column in the similarity matrix)
  • if your neighbors are precomputed, compute the neighbors for the affected user (for a complete update, you may have to recompute all neighbors, but I would stick with the approximate solution)

Slope-One:

  • update the frequency (only in the 'add' case) and the diff matrix entries for the affected item (again, one row and one column)

Remark: If your 'similarity' is asymmetric, you need to update one row and one column. If it is symmetric, updating one row automatically results in updating the corresponding column. For Slope-One, the matrices are symmetric (frequency) and skew symmetric (diffs), so if you handle you also need to update one row or column, and get the other one for free (if your matrix storage works like this).

If you want to see an example of how this could be implemented, have a look at MyMediaLite (disclaimer: I am the main author): https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/ItemKNN.cs The interesting code is in the method RetrainItem(), which is called from AddRatings() and UpdateRatings().

0
On

The general thing are called online algorithms.

Instead of retraining the whole predictor, it can be updated "online" (while remaining useable) with the new data only.

If you google for "online slope one predictor" you should be able to find some relevant approaches from literature.