I’m currently working on a project where I’m using the SentenceTransformer model from the sentence-transformers library to generate embeddings for text data. I would like to store these pre-generated embeddings in Chroma for later use.
Here’s a simplified version of my code:
from PyPDF2 import PdfReader
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import CharacterTextSplitter
from sentence_transformers import SentenceTransformer
def generate_embeddings() -> Chroma:
pdf = PdfReader("path_to_my_pdf")
raw_text = ''
for i, page in enumerate(pdf.pages):
content = page.extract_text()
if content:
raw_text += content
text_splitter = CharacterTextSplitter(
separator = "\n",
chunk_size = 750,
chunk_overlap = 50,
length_function = len,
)
texts = text_splitter.split_text(raw_text)
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(texts)
vectordb = Chroma.from_texts(
texts = texts
embeddings = embeddings,
persist_directory = "path_to_persist_directory"
)
vectordb.persist()
return vectordb
However, I’m encountering an issue with the Chroma.from_texts method. The error said "AtributeError: 'numpy.ndarray' object has no attribute 'embed_documents'
Could you please guide me on how to create a Chroma object from pre-generated embeddings? Is there a method or a workaround that I can use to achieve this?
Any help would be greatly appreciated. Thank you in advance!