An ALS recommendation model performs a matrix factorization where it factorizes a matrix of users vs items in latent factors.
A matrix of 3 users and 3 items would look like this:
users | item_1 | item_2 | item_3 |
---|---|---|---|
user_1 | NA | 4 | 1 |
user_2 | 4 | 3 | 0 |
user_3 | NA | 1 | NA |
My dataframe starts such as:
users | items | rating |
---|---|---|
user_1 | item_2 | 4 |
user_1 | item_3 | 1 |
user_2 | item_1 | 4 |
user_2 | item_2 | 3 |
user_2 | item_3 | 0 |
user_3 | item_2 | 1 |
My question is, before inserting my dataframe in ALS module, do I need to transform it in way where, at the end, I will have a structure such as:
users | items | rating |
---|---|---|
user_1 | item_1 | NA |
user_1 | item_2 | 4 |
user_1 | item_3 | 1 |
user_2 | item_1 | 4 |
user_2 | item_2 | 3 |
user_2 | item_3 | 0 |
user_3 | item_1 | NA |
user_3 | item_2 | 1 |
user_3 | item_3 | NA |
Or, will, under the hood, ml.recommendation.ALS function create those observations related to the places without interactions? Such as:
users | items | rating |
---|---|---|
user_1 | item_1 | NA |
If it does not, a way to produce the expected table, would be pivot it and then unpivot it, but it would produce a very huge matrix of users vs items. However, from the examples presented in the documentation, it seems that this process (pivot and then, unpivot) is not necessary.
Yes. It is not necessary.
After you train you the ALS model, the fitted model should be used to predict the "missing interactions".
Thus, the term "fill" (in your sentence " ml.recommendation.ALS module fill those missing interactions") is not appropriate, you should uses the term "predict".