Context:
I learned that to improve the search results of the RAG based approach, we can finetune the open source embedding models.
Now I have one pdf file (my private dataset) using which I will create the train/eval dataset, finetune my embedding model, and store the embeddings into a vector DB. Suppose n embeddings were stored into the DB.
Question:
Now tommorrow, a new pdf file comes to me.
Then should I re-finetune the earlier finetuned embedding model?
Because this re-finetuned model will now generate slightly different embeddings, will the earlier
nembeddings stored in my DB get outdated? Will I have to delete them?So now, if there are
mnew embeddings from the secondpdffile to be stored, will I have to store a total of (n + m) new embeddings into the DB?
So as the new data comes up, this problem will become a squared order problem as far as complexity is considered.