tf-idf to users preferences vector

423 Views Asked by At

I'm fairly new here and I'm thanking in advance everybody who will take the time to read this question.

We're building a recommender system using tf-idf to generate normalised vectors of documents. Based on the interactions of the users with the documents (like, don't like, spend time on it etc...), we want to generate users profiles that follows the same structure than the documents themselves.

While there is a lot of literature about recommender systems and content based filtering on the 'product' side, there is very little about the structuring of the users preferences themselves. I'm not exactly asking a 'solution' but rather to please point us in the right direction (or simply, a direction). We might work out something ourselves, but no need to reinvent the wheel if there's already quite developed solutions.

Many thanks all! Daniel

1

There are 1 best solutions below

1
On

Your question is a little bit difficult to understand but based on what I understood I want to share just a simple idea that might guide you to the right path:

Firstly, you can think of your tfidf vectors as part of a high-dimensional vector space. Assuming that documents are rather grouped in clusters, you could try to project your users into these clusters and select elements of the closest cluster. But to do this, I would recommend not to use multiple labels but rather just 'user liked'.

A user vector could be the average of the tfidf vectors of the documents he liked. This, however, can only work well if the user has homogeneous preferences (preferably only from one cluster) because if he likes a lot of documents from far away clusters, he will find himself between those clusters which might not necessarily reflect his interests. But if the structure of the preferences plays along, this could work well.
You proceed by determining the closest cluster to the user vector and then choose other documents from that cluster as a recommendation.

For distances you could start with cosine distance and you can find clusters using a simple K-Nearest Neighbor algorithm for example (see scikit learn).