I'm currently using Redis Vector Store in conjunction with the Langchain framework. My application is configured to retrieve four distinct chunks, but I've noticed that sometimes all four chunks are identical. This is causing some inefficiencies and isn't the expected behavior. Does anyone know why this might be happening and have any recommendations on how to resolve it?
def getVectorStore(database: str, index_name: str = "KU_RULE_05") -> Redis:
if database not in vectorstore:
raise ValueError(f"{database} does not exist in vectorstore list in utils.py")
if database == "Redis":
VectorStore = Redis.from_existing_index(
embedding=embedding(),
redis_url=os.getenv("REDIS_URL"),
index_name=index_name)
return VectorStore
def getRelatedDocs(content: str, database="Redis"):
VectorStore = getVectorStore(database=database, index_name=index_name)
RelatedDocs = []
for index, documents in enumerate(VectorStore.similarity_search(query=content)):
RelatedDocs.append("{}: {}".format(index + 1, documents.page_content))
return RelatedDocs
We've thoroughly checked for any duplicate documents in the database to see if that could be the cause of the issue, but we found no duplicates.
Ok so most likely you are continuing to use the
from_documents
method in thegetVectorStore
function when you should actually be using thefrom_existing_index
method. You're likely re-generating and uploading the embeddings each time, each with a unique UUID, hence the duplicates.The flow for reusing an index once created (as it is in
from_documents
) is:from_existing_index
(make sure to pass schema if using metadata)as_retriever
or use the search methods directly, i.e.similarity_search
.example
then to init an index that exists you can do
Notice, I'm passing the schema above. If you're using metadata, you can write out the schema file using the
write_schema
method.I highly recommend going through the documentation for the newer release of the redis integrations as well
https://python.langchain.com/docs/integrations/vectorstores/redis
Ok now given your code, I'm not positive what's actually causing this error since you're positive you've curated your database contents, but could you try
If this doesn't work I would try to run simpler examples with your codebase and see if a more trivial example works.