Compare average - SPARQL

625 Views Asked by At

There is a dataset of users ranking movies. Need to find the users with similar taste to user1. Similar taste defined as follows: consider the average rank for genre from user1 as avgr1 and for the same genre from user2 as avgr2, then user1 and user2 have similar taste is abs(avgr1-avgr2)<1 . So far I was able to get the names, the genre and the absolute value between averages, but the filtering for comparison is not working.

SELECT ?p ?p1 ?genre (abs (AVG(?rating)-AVG(?ratingp1)) AS ?RDiff)
WHERE{
?p movies:hasRated ?rate.
?p1 foaf:knows ?p.
?rate movies:ratedMovie ?mov.
?rate movies:hasRating ?rating.
?mov movies:hasGenre ?genre.
?p1 movies:hasRated ?ratep1.
?ratep1 movies:ratedMovie ?movp1.
?ratep1 movies:hasRating ?ratingp1.
?movp1 movies:hasGenre ?genre.
FILTER (xsd:float(?Rdiff)<1.0 && ?p=movies:user1)
}
GROUP BY ?p ?p1 ?genre
2

There are 2 best solutions below

1
On BEST ANSWER

It's very hard to answer these kinds of questions without some sample data to work with. Here's some sample data that has two users who have similar rankings on comedy, but different rankings on romance:

@prefix : <urn:ex:>

:a :ranks [ :genre :comedy ; :value 2 ],
          [ :genre :comedy ; :value 3 ],
          [ :genre :comedy ; :value 3 ],
          [ :genre :romance ; :value 7 ],
          [ :genre :romance ; :value 8 ],
          [ :genre :romance ; :value 9 ].

:b :ranks [ :genre :comedy ; :value 3 ],
          [ :genre :comedy ; :value 3 ],
          [ :genre :comedy ; :value 4 ],
          [ :genre :romance ; :value 0 ],
          [ :genre :romance ; :value 1 ],
          [ :genre :romance ; :value 0 ].

Here's a query that computes the difference of their average rankings on genres:

prefix : <urn:ex:>

select ?user1 ?user2 ?genre (abs(avg(?value1)-avg(?value2)) as ?diff) {
  ?user1 :ranks [ :genre ?genre ; :value ?value1 ].
  ?user2 :ranks [ :genre ?genre ; :value ?value2 ].
  filter (str(?user1) < str(?user2)) #-- avoid duplicate user1/user2, user2/user1 results
}
group by ?user1 ?user2 ?genre
order by ?diff
---------------------------------------------------------
| user1 | user2 | genre    | diff                       |
=========================================================
| :a    | :b    | :comedy  | 0.666666666666666666666667 |
| :a    | :b    | :romance | 7.666666666666666666666667 |
---------------------------------------------------------

Now, you can't filter on aggregate results, you have to use having, so to only take values where diff is less than some particular value, you'd do this:

prefix : <urn:ex:>

select ?user1 ?user2 ?genre (abs(avg(?value1)-avg(?value2)) as ?diff) {
  ?user1 :ranks [ :genre ?genre ; :value ?value1 ].
  ?user2 :ranks [ :genre ?genre ; :value ?value2 ].
  filter (str(?user1) < str(?user2))
}
group by ?user1 ?user2 ?genre
having (?diff < 1)
order by ?diff
--------------------------------------------------------
| user1 | user2 | genre   | diff                       |
========================================================
| :a    | :b    | :comedy | 0.666666666666666666666667 |
--------------------------------------------------------

If you don't care about the actual diff, except that it's below the threshold, you can put the expression in the having directly, and do:

select ?user1 ?user2 ?genre {
  #-- ...
}
group by ?user1 ?user2 ?genre
having (abs(avg(?value1)-avg(?value2)) < 1)
0
On

?Rdiff is not defined in the FILTER - the happens before the AS in the SELECT.

Try using HAVING which goes after the GROUP BY.