Does pyspark.ml.recommendation.ALS create a pivot table under the hood?

158 Views Asked by At

An ALS recommendation model performs a matrix factorization where it factorizes a matrix of users vs items in latent factors.

A matrix of 3 users and 3 items would look like this:

users item_1 item_2 item_3
user_1 NA 4 1
user_2 4 3 0
user_3 NA 1 NA

My dataframe starts such as:

users items rating
user_1 item_2 4
user_1 item_3 1
user_2 item_1 4
user_2 item_2 3
user_2 item_3 0
user_3 item_2 1

My question is, before inserting my dataframe in ALS module, do I need to transform it in way where, at the end, I will have a structure such as:

users items rating
user_1 item_1 NA
user_1 item_2 4
user_1 item_3 1
user_2 item_1 4
user_2 item_2 3
user_2 item_3 0
user_3 item_1 NA
user_3 item_2 1
user_3 item_3 NA

Or, will, under the hood, ml.recommendation.ALS function create those observations related to the places without interactions? Such as:

users items rating
user_1 item_1 NA

If it does not, a way to produce the expected table, would be pivot it and then unpivot it, but it would produce a very huge matrix of users vs items. However, from the examples presented in the documentation, it seems that this process (pivot and then, unpivot) is not necessary.

1

There are 1 best solutions below

1
lanenok On

Yes. It is not necessary.

After you train you the ALS model, the fitted model should be used to predict the "missing interactions".

Thus, the term "fill" (in your sentence " ml.recommendation.ALS module fill those missing interactions") is not appropriate, you should uses the term "predict".