I'm trying to create a Qdrant vectorsore and add my documents.
- My embeddings are based on
OpenAIEmbeddings
- the
QdrantClient
is local for my case - the collection that I'm creating has the
VectorParams as such:
VectorParams(size=2000, distance=Distance.EUCLID)
I'm getting the following error:
ValueError: could not broadcast input array from shape (1536,) into shape (2000,)
I understand that my error is how I configure the vectorParams, but I don't undertsand how these values need to be calculated.
here's my complete code:
import os
from typing import List
from langchain.docstore.document import Document
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Qdrant, VectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
def load_documents(documents: List[Document]) -> VectorStore:
"""Create a vectorstore from documents."""
collection_name = "my_collection"
vectorstore_path = "data/vectorstore/qdrant"
embeddings = OpenAIEmbeddings(
model="text-embedding-ada-002",
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
qdrantClient = QdrantClient(path=vectorstore_path, prefer_grpc=True)
qdrantClient.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(size=2000, distance=Distance.EUCLID),
)
vectorstore = Qdrant(
client=qdrantClient,
collection_name=collection_name,
embeddings=embeddings,
)
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
sub_docs = text_splitter.split_documents(documents)
vectorstore.add_documents(sub_docs)
return vectorstore
Any ideas on how I should configure the vector params properly?
So, as I see, the value of
1536
is fixed by the vector size of theOpenAIEmbeddings
.Quoting from this article: https://openai.com/blog/new-and-improved-embedding-model
Thus, changing the above code to
VectorParams(size=1536, distance=Distance.EUCLID)
, made the trick.