I passed a dense vector to Solr9 for indexing but it takes the values passed and put them into a variable whose data type is pdoubles. I have made changes to the managed-schema.xml to identify the field named vector
as a knn_vector
, but solr dynamically created a new field named vectors
of type pdouble
.
Lines that i added to managed-schema.xml
<fieldType name="knn_vector" class="solr.DenseVectorField" vectorDimension="768" similarityFunction="euclidean"/>
<field name="vector" type="knn_vector" indexed="true" stored="true"/>
Dynamically added lines by solr itself
<field name="vectors" type="pdoubles"/>
For reference my code
embedder = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')
corpus = [documents[d]['paragraph'] for d in documents]
corpus_embeddings = embedder.encode(corpus, convert_to_tensor=False)
d=0
for row in corpus_embeddings:
documents[str(d)]['vectors']=np.array(row).tolist()
d=d+1
import pysolr
solr = pysolr.Solr('http://localhost:8983/solr/VectorPilotRun/', always_commit=True, timeout=10)
results=solr.search("{!knn f=vector topK=10}"+str(documents['500']['vectors']))
print("Saw {0} result(s).".format(len(results)))
for result in results:
print("The details are : '{0} {1} {2}'\n.".format(result['id'],result['paragraph'],result['paragraph_num']))
The result of this search is null
.
When I try to query the knn_vector
field that is vector
field it shows no results. I believe this is because all the data is associated to the vectors(pdouble)
field instead of vector(knn_vector)
.
How do I add data so that it is stored in the correct field and type and not dynamically changed to another type? I have used pysolr to add data and the vector are list of float values.
Beside the managed-schema.xml you should add a separate file schema.xml in the config dir before creating solr collection
For example,