For e.g. : Question : What is the capital of USA? Expected Answer : Washington D.C. is the capital of USA. Actual Answer : USA is the capital of Washington D.C.
The answers are lexically similar however they are semantically different due to the subject-object swap.
I'm new to NLP and I read few articles on Doc2Vec, however the examples provided are not similar enough for my doubt. Please advice methods that I should be trying and any references.
Relatively-shallow & word-order-oblivious algorithms – like word2vec & 'paragraph vectors' (aka
Doc2Vecin many implementations) – can't tell the semantic difference between those two sentences.You'd have to use deeper models, that have some understanding of how grammar & word-order affect meaning.
Look at things which use deeper recurrent networks to summarize sentences/paragraphs, like BERT & related/followup work, or text-vectorizers related to LLMs.