Semantic Chunking with Langchain on FAISS vectorstore

57 Views Asked by At

I have this Langchain code for my own dataset:

from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

vectorstore = FAISS.from_texts(
    docs, embedding=OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
)
retriever = vectorstore.as_retriever()

and I want to add semantic chunking for the dataset (docs) before (or after if possible) I save them to the vector store. Specifically, I have been trying to add the following snippet before the previous code:

from langchain_experimental.text_splitter import SemanticChunker

text_splitter = SemanticChunker(OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY))
docs = text_splitter.create_documents(docs)

to convert docs into chunked format but it doesn't work possibly because the structure is different.

Has anyone tried and succeeded in this before?

1

There are 1 best solutions below

0
j3ffyang On

Try

docs = text_splitter.create_documents([docs])

which expects a [list]. Reference > https://python.langchain.com/docs/modules/data_connection/document_transformers/semantic-chunker