Unexpected results when creating embeddings using openai

69 Views Asked by At

I am going to share an example below and I am interested if anyone has any insight into best practices when creating embeddings. I use OpenAI "text-embedding-ada-002" model.

So I created embeddings for the following inputs:

  • "Dog"

  • "Cat"

  • "Monkey"

  • "Peanut butter"

Now I would think that the following would be bucketed close together as they are animals:

  • "Dog"

  • "Cat"

  • "Monkey"

and if I created an embedding an embedding for another animal and ran a similarity search against my vector db, in most cases I would find that if I creating an embedding for an animal, then the top results returned would be an animal; and if I created an embedding for a food, then the top result would be "peanut butter."

However, I found that in some cases I would not get what I expect. For example, I created an embedding for the input "Tomato" and ran a similarity search. While I would have expected the top result to be another food like "Peanut butter", the top result was "Cat," and I am not sure why. Can someone help explain or advise what best practices to follow when creating embeddings?

enter image description here

0

There are 0 best solutions below