Iterative fine-tuning of the embedding models

91 Views Asked by At

Context:

I learned that to improve the search results of the RAG based approach, we can finetune the open source embedding models.

Now I have one pdf file (my private dataset) using which I will create the train/eval dataset, finetune my embedding model, and store the embeddings into a vector DB. Suppose n embeddings were stored into the DB.

Question:

Now tommorrow, a new pdf file comes to me.

  1. Then should I re-finetune the earlier finetuned embedding model?

  2. Because this re-finetuned model will now generate slightly different embeddings, will the earlier n embeddings stored in my DB get outdated? Will I have to delete them?

  3. So now, if there are m new embeddings from the second pdf file to be stored, will I have to store a total of (n + m) new embeddings into the DB?

So as the new data comes up, this problem will become a squared order problem as far as complexity is considered.

0

There are 0 best solutions below