I'm trying to use sklearn.decomposition.NMF
to a matrix R
that contains data on how users rated items to predict user ratings for items that they have not yet seen.
the matrix's rows being users, columns being items, and values being scores, with 0 score meaning that the user did not rate this item yet.
Now with the code below I have only managed to get the two matrices that when multiplied together give the original matrix back.
import numpy
R = numpy.array([
[5,3,0,1],
[4,0,0,1],
[1,1,0,5],
[1,0,0,4],
[0,1,5,4],
])
from sklearn.decomposition import NMF
model = NMF(n_components=4)
A = model.fit_transform(R)
B = model.components_
n = numpy.dot(A, B)
print(n)
Problem is, that the model does not predict new values in place of 0
's, that would be the predicted scores, but instead recreates the matrix as was.
How do I get the model to predict user scores in place of my original matrix's zeros?
That is what is supposed to happen.
However in most of the cases you are not going to have number of components so similar to the number of products and/or customers.
So for instance considering 2 components
You can see in this case that many of the previous zeros are now other numbers you could use. Here for a bit of context https://en.wikipedia.org/wiki/Matrix_factorization_(recommender_systems).
How to select n_components?
I think the question above is answered, but in case the complete procedure could be something as below.
For that we will need to know a the values in R that are real and we want to focus to predict.
In many cases 0 in R are those new cases / scenarios. It is common to update R with the averages for products or customers and then calculate the decomposition for selecting the ideal n_components. For selection of they maybe a criteria or more to calculate the advantage in a test sample
Perhaps good to see: