Python Text Matching - Synonyms

1.3k Views Asked by At

I have two columns in Pandas: A and B, each of which contains strings of terms. My objective is to find the entry in column B which is most similar to column A. I am already using the TF-IDF to do this, but sometimes there are synonyms which do not obviously match e.g. money and currency.

How can I find matches which also include synonyms?

1

There are 1 best solutions below

0
On

I'm not sure how TF-IDF would be of use here if you are working with individual word pairs.

Anyways, there are two obvious solutions to this.

Use a traditional knowledge base, I would recommend Wordnet for this use case, it's widely considered a standard in the industry.

The second option would be to use the machine learning algorithm Word2Vec (or a variant like Glove). I would say this is the easiest solution if you use a model with is already trained like the Google News one. Look into Gensim's implementation to load the model and compute similarities.