Sense similarity matrix using WordNet

171 Views Asked by At

I have a vocabulary of unique words (excluding the stopwords) used over the entire document collection. I want to perform query expansion. In some approaches I have found that for every word in the query its top k synonyms (usually k=3) is augmented to the query. However, I am using a vector space model based on TFIDF document representation so adding words to the query which are not in the vocabulary will eventually get dropped off. Also, since it would not use a word sense disambiguation technique hence adding synonyms would not guarantee that the sense in which the words in the query are used is retained by the added synonyms thereby leading to query drifting. Hence I am thinking to create a Sense Similarity Matrix which will consist of similarity score between the query and all possible senses in which the words in the vocabulary have been used over the entire corpus. The similarity score would be calculated either on the basis of information theoretic or path based approach.

However, I am unable to understand how to find all the senses in which the words in the vocabulary have been used. Also, is my approach correct? Can someone please guide me in this by pointing to some relevant resources?

1

There are 1 best solutions below

0
On

If you look for words with similar semantic meanings, I think you should look into word2vec and its improved variants like Glove (https://nlp.stanford.edu/projects/glove/) and fasttext (https://fasttext.cc/). They are basically vector representations of words that you could calculate similarity among words to build a full similarity matrix. You could also query the models for top N similar words.