I am making a project like this one here: https://www.youtube.com/watch?v=dovB8uSUUXE&feature=youtu.be but i am facing trouble because i need to check the similarity between the sentences for example: if the user said: 'the person wear red T-shirt' instead of 'the boy wear red T-shirt' I want a method to check the similarity between these two sentences without having to check the similarity between each word is there a way to do this in python?
I am trying to find a way to check the similarity between two sentences.
Most of there libraries below should be good choice for semantic similarity comparison. You can skip direct word comparison by generating word, or sentence vectors using pretrained models from these libraries.
Sentence similarity with
Spacy
Required models must be loaded first.
For using
en_core_web_md
usepython -m spacy download en_core_web_md
to download. For usingen_core_web_lg
usepython -m spacy download en_core_web_lg
.The large model is around ~830mb as writing and quite slow, so medium one can be a good choice.
https://spacy.io/usage/vectors-similarity/
Code:
Output:
Sentence similarity with
Sentence Transformers
https://github.com/UKPLab/sentence-transformers
https://www.sbert.net/docs/usage/semantic_textual_similarity.html
Install with
pip install -U sentence-transformers
. This one generates sentence embedding.Code:
Output:
Now embedding vector can be used to calculate various similarity metrics.
Code:
Output:
Same thing with
scipy
andpytorch
,Code:
Output:
Code:
Output:
Sentence similarity with
TFHub Universal Sentence Encoder
https://tfhub.dev/google/universal-sentence-encoder/4
https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb
Model is very large for this one around 1GB and seems slower than others. This also generates embeddings for sentences.
Code:
Output:
Code:
Output:
Other Sentence Embedding Libraries
https://github.com/facebookresearch/InferSent
https://github.com/Tiiiger/bert_score
This illustration shows the method,
Resources
How to compute the similarity between two text documents?
https://en.wikipedia.org/wiki/Cosine_similarity#Angular_distance_and_similarity
https://towardsdatascience.com/word-distance-between-word-embeddings-cc3e9cf1d632
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.cosine.html
https://www.tensorflow.org/api_docs/python/tf/keras/losses/CosineSimilarity
https://nlp.town/blog/sentence-similarity/