pySolr : Adding a multivalued field

1k Views Asked by At

I would like to fill an solr index from a pandas dataframe. The dataframe is as follows:

position        value
 5.6,-2.3        65
 -35.6,-1.2      43.1

#...

etc.

I am doing the following to transform the dataframe to a json object and then adding it to solr:

import json
import pandas as pd 
import pysolr

# I have a pandas dataframe df as described above
jsonObject = json.loads(df.to_json(orient='records'))

solrServer = pysolr.Solr('pathToMySolrIndex',timeout=100)

solrServer.add(jsonObject)

I get the following error:

multiple values encountered for non multiValued field position

If I change the name of the fied position to _position , then it kind of works. From pysolr's documentation page, I understand this creates a parent/child dependency which I don't really want. Indeed, reading back from the index using:

results = solrServer.search(**{'q':'*'})
df2 = pd.DataFrame(list(results))
print(df2.head())

I get something like this:

_position        value
 [5.6,-2.3]        [65]
 [-35.6,-1.2]      [43.1]

#...

Despite this "hackish" solution, I'm still not getting a good result: Each element is a list. I would have preferred tuples for position, and simple floats for value. I guess this comes from the orient keyword when converting to json.

Questions and Expected output

First, I would like to avoid renaming position to _position . The Solr database doesn't have to contain renamed fields for the sake of pysolr.

Second, I would like to avoid having lists when reading from the built Solr index. I know that Solr doesn't have to contain lists as numerical elements. The problem seems to come from the transformation from DataFrame to json. How to do this?

0

There are 0 best solutions below