I created two dbs like this (same embeddings) using langchain 0.0.143:
db1 = Chroma.from_documents(
documents=texts1,
embedding=embeddings,
persist_directory=persist_directory1,
)
db1.persist()
db21 = Chroma.from_documents(
documents=texts2,
embedding=embeddings,
persist_directory=persist_directory2,
)
db2.persist()
then later accessing them with
db1 = Chroma(
persist_directory=persist_directory1,
embedding_function=embeddings,
)
db2 = Chroma(
persist_directory=persist_directory2,
embedding_function=embeddings,
)
How do I combine db1 and db2? I want to use them in a ConversationalRetrievalChain setting retriever=db.as_retriever().
I tried a couple of suggestions from searching but am missing something obvious
The simpler option is going to be loading the two documents into the same Chroma object. They'll retain separate metadata, so you can still tell which document each embedding came from:
The more complicated option: default Chroma storage is two parquet files and an index. If you could guarantee no index conflicts, you could theoretically merge the respective parquet files and merge the two
index/folders by copying the content of each into a newindex/folder adjacent to the two new parquet files.