Data looks different after converting to realRatingMatrix

383 Views Asked by At

I am trying to work on a recommendation system in R. Data Set below: https://drive.google.com/file/d/1FVh-Xg3NBtzKgZHnDTi7IjaATW_fPmW9/view?usp=sharing

beer_data <- read.csv("beer_data.csv", stringsAsFactors = F)
library(recommenderlab)
r <- as(beer_data, "realRatingMatrix")

Now if we check the number of reviews in each object, both are not matching

nrow(beer_data)  # 475984
length(getRatings(r)) # 474560

And also range of rating is not matching :

> range(beer_data_master$review_overall)

[1] 0 5

> range(getRatings(r))

[1] 0 15

I have checked with other data set too, there is no issue appearing.

1

There are 1 best solutions below

0
On BEST ANSWER

I got the answer:

There are some users in the data who have rated the same beer more than once (twice/thrice... etc.). So recommenderLabs when coercing data into realRatingMatrix adds the rating of such rows and that's why value of ratings are more than 5 and length of getRatings is less than nrow of beer_data.

E.g. sample beer_data

beer_beerid, review_profilename, review_overall

19667, 57md, 3.5 19667, 57md, 4.0

so in realRatingMatrix for user="57md" and item = "19667" rating = 3.5+4 = 7.5 and 1 row gets reduced in realRatingMatrix.

And due to the same reason, non unique combination of beer_beerid and rating getting combined which is causing mismatch in count of rating in both objects, dataframe and realRatingMatrix.