Here's a theoretical question. Let's assume that I have implemented two types of collaborative filtering: user-based CF and item-based CF (in the form of Slope One).
I have a nice data set for these algorithms to run on. But then I want to do two things:
- I'd like to add a new rating to the data set.
- I'd like to edit an existing rating.
How should my algorithms handle these changes (without doing a lot of unnecessary work)? Can anyone help me with that?
For both cases, the strategy is very similar:
user-based CF:
Slope-One:
Remark: If your 'similarity' is asymmetric, you need to update one row and one column. If it is symmetric, updating one row automatically results in updating the corresponding column. For Slope-One, the matrices are symmetric (frequency) and skew symmetric (diffs), so if you handle you also need to update one row or column, and get the other one for free (if your matrix storage works like this).
If you want to see an example of how this could be implemented, have a look at MyMediaLite (disclaimer: I am the main author): https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/ItemKNN.cs The interesting code is in the method RetrainItem(), which is called from AddRatings() and UpdateRatings().