Configure Multitenancy with Langchain and Qdrant

677 Views Asked by At

I'm creating a Q&A chatbot and I'm using langchain and qdrant.

I'm trying to configure langchain to be able to use qdrant in a multitenant environment. The doc from qdrant says that the best approach in my case is to use a "Partition by payload" and use a group_id = OneClient inside the payload of each element of a collection, so that then it's possible to filter on that group_id (which in my case will be the client). That's the link to the doc https://qdrant.tech/documentation/tutorials/multiple-partitions/

I'm using langchain and I have added to the documents that I'm saving inside qdrant a "group_id" metadata field.

I'd like to understand how to filter on group_id when I use langchain. This is how I'm using langchain to retrieve the answer to a question:

qdrant = Qdrant(
    client=QdrantClient(...),
    collection_name="collection1",
    embeddings=embeddings
)
prompt = ...
llm = ChatOpenAI(...) 
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
     llm=llm,
     chain_type="stuff",
     return_source_documents=True,
     retriever=qdrant.as_retriever(),
     chain_type_kwargs = {"prompt": prompt}
 )
result = qa_chain({"question": question})

The group_id will represent the client and it is known before the question.

Any help is much appreciated, Thanks.

2

There are 2 best solutions below

0
On BEST ANSWER

I have found the answer. Thanks for all the suggestions.

To filter on an attribute "group_id" which is the client_id, I'm adding a metadata group_id = client when I load some data with "VectoreStore.from_documents" and I'm using the "as_retriever" function to pass the search filter and return only the sources with that group_id:

chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type=chain_type,
    max_tokens_limit=max_tokens_limit,
    return_source_documents=True,
    retriever=vectorstore.as_retriever(
        search_kwargs={'filter': {'group_id': client}}
    ),
    reduce_k_below_max_tokens=False,
    chain_type_kwargs = {"prompt": prompt}
)
0
On

the way I handled it was to create a child class of the QdrantVectorStore and then override the addVectors method where the actual payload is defined here I send the group_id in metadata and pick it from there using in addVectors

    const { QdrantVectorStore,  } = require('langchain/vectorstores/qdrant');
const uuid_1 = require("uuid");

class CsQdrantVectorStore extends QdrantVectorStore {
    
    async addVectors(vectors, documents) {
        if (vectors.length === 0) {
            return;
        }
        await this.ensureCollection();
        const points = vectors.map((embedding, idx) => ({
            id: (0, uuid_1.v4)(),
            vector: embedding,
            payload: {
                content: documents[idx].pageContent,
                metadata: documents[idx].metadata,
                group_id: documents[idx].metadata.group_id,
            },
        }));
        await this.client.upsert(this.collectionName, {
            wait: true,
            points,
        });
    }



    async addDocuments(documents) {
        const texts = documents.map(({ pageContent }) => pageContent);
        await this.addVectors(await this.embeddings.embedDocuments(texts), documents);
    }

    static async fromDocuments(docs, embeddings, dbConfig) {
        const instance = new this(embeddings, dbConfig);
        await instance.addDocuments(docs);
        return instance;
    }
}

module.exports = {
    CsQdrantVectorStore
};

and then to use it we can import the CsQdrantVectorStore instead of QdrantVectorStore