Apache Spark, ALS Recomendation example in documentation has a extra column I dont know its use

885 Views Asked by George C At 05 June 2025 at 10:15

In the ALS example I have the following code:

(http://spark.apache.org/docs/latest/ml-collaborative-filtering.html)

from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row

lines = spark.read.text("data/mllib/als/sample_movielens_ratings.txt").rdd
parts = lines.map(lambda row: row.value.split("::"))
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
                                     rating=float(p[2]), timestamp=long(p[3])))
ratings = spark.createDataFrame(ratingsRDD)
(training, test) = ratings.randomSplit([0.8, 0.2])

# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating")
model = als.fit(training)

# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating", predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

If you see it Creates a Row with the attribute timestamp, but then int the ALS creation it doesn't use it.

What is the purpose of the attribute timestamp in the Row?

Original Q&A

There are 1 best solutions below

user7367299 On 03 January 2017 at 00:37 BEST ANSWER

None. It is just one of the fields that come with MovieLens data. For ALS it has no use and you can ignore it.

Apache Spark, ALS Recomendation example in documentation has a extra column I dont know its use

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-MLLIB

Related Questions in RECOMMENDATION-ENGINE

Trending Questions

Popular # Hahtags

Popular Questions