I want to implement chat with multiple pdf as well as individual PDF. Now for that either I have to create different collections for every PDF and a single collection containing all vectors of all pdfs, I discussed this with Qdrant guys, and they suggest me to use their multitenancy, where we create single collection and partition it using payload, and also at time of retrieving we can provide the payload to only search from that specific payload vectors. Now the thing is we are using langchain js in our company for creating collections and retrieving data using :
langchain/vectorstores/qdrant
from Langchain Qdrant. Now how to parition a collection and search from it using langchain?
Docs regarding multitenancy - https://qdrant.tech/documentation/guides/multiple-partitions/
Also their code for partitioning using payloads -
client.upsert("{collection_name}", { points: [ { id: 1, payload: { group_id: "user_1" }, vector: [0.9, 0.1, 0.1], }, { id: 2, payload: { group_id: "user_1" }, vector: [0.1, 0.9, 0.1], }, { id: 3, payload: { group_id: "user_2" }, vector: [0.1, 0.1, 0.9], }, ], });
But We are adding collection using Langchain Js like this :
const vectorStore = await QdrantVectorStore.fromDocuments( docs, new OpenAIEmbeddings(), { url: process.env.QDRANT_URL, collectionName: "a_test_collection", } );
How to solve this?
You can certainly use multitenancy with Qdrant and LangchainJS.
dbConfig
argument of thefromDocuments()
method accepts acustomPayload
value. Which can be any payload, in your case a PDF ID to implement multitenancy.filter
argument when instantiating your retriever to filter out the documents.