Does pyspark.ml.recommendation.ALS create a pivot table under the hood?

137 Views Asked by At

An ALS recommendation model performs a matrix factorization where it factorizes a matrix of users vs items in latent factors.

A matrix of 3 users and 3 items would look like this:

users item_1 item_2 item_3
user_1 NA 4 1
user_2 4 3 0
user_3 NA 1 NA

My dataframe starts such as:

users items rating
user_1 item_2 4
user_1 item_3 1
user_2 item_1 4
user_2 item_2 3
user_2 item_3 0
user_3 item_2 1

My question is, before inserting my dataframe in ALS module, do I need to transform it in way where, at the end, I will have a structure such as:

users items rating
user_1 item_1 NA
user_1 item_2 4
user_1 item_3 1
user_2 item_1 4
user_2 item_2 3
user_2 item_3 0
user_3 item_1 NA
user_3 item_2 1
user_3 item_3 NA

Or, will, under the hood, ml.recommendation.ALS function create those observations related to the places without interactions? Such as:

users items rating
user_1 item_1 NA

If it does not, a way to produce the expected table, would be pivot it and then unpivot it, but it would produce a very huge matrix of users vs items. However, from the examples presented in the documentation, it seems that this process (pivot and then, unpivot) is not necessary.

1

There are 1 best solutions below

1
On

Yes. It is not necessary.

After you train you the ALS model, the fitted model should be used to predict the "missing interactions".

Thus, the term "fill" (in your sentence " ml.recommendation.ALS module fill those missing interactions") is not appropriate, you should uses the term "predict".