LangChain vectorStore: how to use search_kwargs filter?

1.3k Views Asked by At

I am following various tutorials on LangChain, and am now trying to figure out how to use a subset of the documents in the vectorstore instead of the whole database.

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

Imagine a chat scenario.

  • User: I am looking for X.
  • Chatbot: (asks a deterministic question e.g.) From what date is this document?
  • User: 2023-11-16 backend Filters the vectorstore somehow e.g. retriever = vectorstore.as_retriever(filters="document_name matches '2023-11-16*'")
  • Chatbot: here are some relevant documents: ...

In the documentation, they list an example:

docsearch.as_retriever(
    search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}}
)

What isn't clear is:

  • What is paper_title? Is that metadata or text inside the document?
  • If this is metadata, then how to specify it?
1

There are 1 best solutions below

2
On

What is paper_title? Is that metadata or text inside the document?

paper_title is a column name in a document. you are searching through document filtering 'paper_title':'GPT-4 Technical Report'

chromadb uses sqlite to store all the embeddings. you can read here

If this is metadata, then how to specify it?

yes that is metadata and from docs this si how you specify

from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import Document
from langchain.vectorstores import Chroma

docs = [
    Document(
        page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
        metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
    ),
    Document(
        page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
        metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
    ),
    Document(
        page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
        metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
    ),
    Document(
        page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
        metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
    ),
    Document(
        page_content="Toys come alive and have a blast doing so",
        metadata={"year": 1995, "genre": "animated"},
    ),
    Document(
        page_content="Three men walk into the Zone, three men walk out of the Zone",
        metadata={
            "year": 1979,
            "director": "Andrei Tarkovsky",
            "genre": "thriller",
            "rating": 9.9,
        },
    ),
]
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())