Does Euclidean Distance measure the semantic similarity?

1.9k Views Asked by At

I want to measure the similarity between sentences. Can I use sklearn and Euclidean Distance to measure the semantic similarity between sentences. I read about Cosine similarity also. Can someone explain the difference of those to measures and what is the best approach to use?

1

There are 1 best solutions below

1
On

There are multiple options to calculate semantic similarity. It depends on what you want to achieve and which resouces you want to use.

Do you mean semantic similarity as in "the boat swims in the sea" is similar to "the ship floats on the lake" ?

Word embeddings such as word2vec create vectors for each word. Word vectors are positioned in the vector space such that "words that share common contexts in the corpus are located in close proximity to one another in the space" (Wikipedia). .

Euclidian or cosine distance can messure the distance between two word vectors. This is often seen as the semantic similarity between words. To messure the distance or similarity between sentences you could use word movers distance, which is implemented by gensim. word mover distance calculates the distance from one set of word vectors (a sentence) to another by using something called the earth mover distance.

Another way to calculate sentence similarity is doc2vec. See also: How to calculate the sentence similarity using word2vec model of gensim with python