Langchain query using Pinecone and ParentDocumentRetriever returns no results

282 Views Asked by At

I am really not understanding how to retrieve the parent documents using Langchain's ParentDocumentRetriever when using Pinecone. The following code is working for creating the embeddings and inserting them into Pinecone:

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX);
const docstore = new InMemoryStore();

const vectorstore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex }
);

const retriever = new ParentDocumentRetriever({
  vectorstore,
  docstore,
  childSplitter: new HTMLSplitter(),
  parentK: 5,
});

// We must add the parent documents via the retriever's addDocuments method
await retriever.addDocuments(docs);

const retrievedDocs = await retriever.getRelevantDocuments("What is emptiness?");

console.log(retrievedDocs);

The retrievedDocs contains a few parent documents, as expected.

Now that my index is created, I would like to subsequently perform the same operation, but without the await retriever.addDocuments(docs):

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX);
const docstore = new InMemoryStore();

const vectorstore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex }
);

const retriever = new ParentDocumentRetriever({
  vectorstore,
  docstore,
  childSplitter: new HTMLSplitter(),
  parentK: 5,
});

const retrievedDocs = await retriever.getRelevantDocuments("What is emptiness?");

console.log(retrievedDocs);

This yields no results. The documentation is really rather unclear on this: am I expected to implement my own document store containing all of the parent documents with their accompanying IDs or something like that? Can I save the InMemoryStore to the filesystem, or use the LocalFileStore? Does this document store pertain just to the parent documents?

I am not sure how to use LocalFileStore since dropping it in as a replacement causes my IDE to become unhappy, because it extends BaseStore<string, Uint8Array>, whereas InMemoryStore extends BaseStore<string, T>.

In summary, how would I use Pinecone as a vector store in combination with ParentDocumentRetriever? What document store do I use?

It seems to me that this would be a pretty common use case; where might I find an example?

1

There are 1 best solutions below

0
On

Does this document store pertain just to the parent documents?

Yes, while the vector store persists the child chunks, the docstore persists the parent full docs/chunks.

how would I use Pinecone as a vector store in combination with ParentDocumentRetriever?

You have to first save your index once, then load it when you want to use it. Maybe using Faiss instead of Pinecone would be easier to save using the vectorstore.save_local("faiss_index") function (https://python.langchain.com/docs/integrations/vectorstores/faiss#saving-and-loading). For easy remote persistence of you vector store you could also use Postgres PGVector.

What document store do I use?

For docstore you can use LocalFileStore or you can persist it remotely with an SQL store. I wrote a medium article that show how to use SQL persistence with PGVector if you want to go for this solution for remote persistence you can find those details here.