Retrieve all documents related to a long text file from FAISS Vectorstore

565 Views Asked by At

Sorry if this question is too basic. But is it possible to retrieve all documents in a vectorstore which are chunks of a larger text file before embedding? Are the documents in vectorstore related to each other according to their metadata or something like that or is it only the similarity between the vectors that related them together?

This is my code :

from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    deployment="embedding",
    model="text-embedding-ada-002",
    openai_api_base="https://test.openai.azure.com/",
    openai_api_type="azure",
    chunk_size=1)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
loader = PyPDFLoader("data2/"+file_item)
documents = loader.load()
texts = text_splitter.split_documents(documents)
db = FAISS.from_documents(
        documents=texts,
        embedding=embeddings
    )

...
document_ids("based on pdf file name"). this should return list of ids
get_list_of_documents_from_faiss(document_ids).  this should return the entire documents with the goal to construct some kind of text again from embedding
0

There are 0 best solutions below