How can I list the users with the most similar entries with ArangoDB

92 Views Asked by At

I started new project today. I have users table, tags table and user_tags edge for graph results.

I attached to users some tags on graph. How can I list the users with the most similar entries with ArangoDB.

For example:

  • user id: 112 has 3 tags (tags ids: 50, 51, 52, 53)
  • user id: 113 has 5 tags (tags ids: 52, 53, 54, 55, 56)
  • user id: 114 has 4 tags (tags ids: 51, 52, 53, 54)
  • user id: 115 has 2 tags (tags ids: 48, 49)

When i searched user id 112 user. The results should be similar to this:

  1. user id: 114 (3 matches, 51, 52, 53)
  2. user id: 113 (2 matches, 52, 53)

Non-common data should not come within results user id: 115

If no one knows arangodb solution, I can use neo4j if there is a solution with neo4j.

Thanks.

2

There are 2 best solutions below

1
On BEST ANSWER

In cypher, this is the query :

MATCH (u1:User {id:114})-[:HAS_TAG]->(tag:Tag),
      (u:User)-[:HAS_TAG]->(tag:Tag)
WITH u, collect(id(tag)) AS tags
RETURN u, tags, size(tags) AS score
ORDER BY score DESC

Cheers

0
On

In ArangoDB, this query will work, so long as you create a graph with users and tags as vertex collections, and user_tags as your edge collection:

LET active_user = FIRST(
    FOR u IN users
    FILTER u.id == @user_id
    RETURN u._id
)

LET active_tags = (
    FOR v IN 1..10 OUTBOUND active_user GRAPH 'user_tags_graph'
    RETURN (v.id)
)

FOR u IN users
FILTER u._id != active_user
    LET tags_in_use = FLATTEN(
        FOR v IN 1..10 OUTBOUND u._id GRAPH 'user_tags_graph'
        RETURN [v.id]
    )
    LET tag_matches = (
        RETURN LENGTH(INTERSECTION(active_tags, tags_in_use))
    )
    FILTER FIRST(tag_matches) > 0
    SORT tag_matches DESC
    RETURN {
        [u.id]: INTERSECTION(active_tags, tags_in_use)
    }

It can probably be optimised heavily but breaking it out like this made it easier to understand.