Is Per-User Retrieval supports open-source vectorstore chromadb?

59 Views Asked by At

From the langchain documentation - Per-User Retrieval

When building a retrieval app, you often have to build it with multiple users in mind. This means that you may be storing data not just for one user, but for many different users, and they should not be able to see eachother’s data. This means that you need to be able to configure your retrieval chain to only retrieve certain information.

The documentation has an example implementation using PineconeVectorStore. Does chromadb support multiple users? If yes, can anyone help with an example of how the per-user retrieval can be implemented using the open source ChromaDB?

2

There are 2 best solutions below

0
chifu lin On BEST ANSWER

We can use filter let Chromadb support multiple users.

from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

persist_directory = 'your_db'

embeddings = OpenAIEmbeddings()
vectordb = Chroma(embedding_function=embeddings,
                  persist_directory=persist_directory)

vectordb.add_texts(["i worked at kensho"], metadatas=[{"user": "harrison"}])
vectordb.add_texts(["i worked at facebook"], metadatas=[{"user": "ankush"}])

# This will only get documents for Ankush
vectordb.as_retriever(search_kwargs={'filter': {'user':'ankush'}}).get_relevant_documents(
    "where did i work?"
)
[Document(page_content='i worked at facebook', metadata={'user': 'ankush'})]
0
Yosua Wijaya On

As response to @chifu lin answer, I think you can't differentiate the owner per document in metadata, since there is caution about that mentioned in here.

Caution: Chroma makes a best-effort to automatically save data to disk, however multiple in-memory clients can stop each other’s work. As a best practice, only have one client per path running at any given time.

I think you can use different persist directory specify in persist_directory parameter when initializing Chroma object, something like this:

username = 'Joe'
db = Chroma.from_documents(pages, embeddings, persist_directory=f"./chroma_db/{username}")

When you want to get the data for user Joe, you can load it from disk like this:

vectordb = Chroma(persist_directory=f"chroma_db/{username}", embedding_function=embeddings)

ADDITION

When using in Langchain as retriever, you can use it directly with as_retriever() function. If you also want to filter the source documents, you can filter it in search_kwargs parameter:

pdf_paths = ['1.pdf', '2.pdf']
search_kwargs = {
"k": 3,
'fetch_k': 10,
'filter':{'source': {'$in': pdf_paths}},}

vectordb.as_retriever(
    search_type="mmr",
    search_kwargs=search_kwargs)