Retrieve all documents related to a long text file from FAISS Vectorstore

557 Views Asked by gabi At 27 July 2025 at 18:55

Sorry if this question is too basic. But is it possible to retrieve all documents in a vectorstore which are chunks of a larger text file before embedding? Are the documents in vectorstore related to each other according to their metadata or something like that or is it only the similarity between the vectors that related them together?

This is my code :

from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    deployment="embedding",
    model="text-embedding-ada-002",
    openai_api_base="https://test.openai.azure.com/",
    openai_api_type="azure",
    chunk_size=1)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
loader = PyPDFLoader("data2/"+file_item)
documents = loader.load()
texts = text_splitter.split_documents(documents)
db = FAISS.from_documents(
        documents=texts,
        embedding=embeddings
    )

...
document_ids("based on pdf file name"). this should return list of ids
get_list_of_documents_from_faiss(document_ids).  this should return the entire documents with the goal to construct some kind of text again from embedding

Original Q&A

Retrieve all documents related to a long text file from FAISS Vectorstore

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in EMBEDDING

Related Questions in LANGCHAIN

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in FAISS

Trending Questions

Popular # Hahtags

Popular Questions