file.json
[{"id":1, "name":"Tim"}, {"id":2, "name":"Jim"}, {"id":3, "name":"Paul"}, {"id":4, "name":"Sam"}]
It's encoded as 'UTF-8 with BOM"
When I use pandas, it works
df = pd.read_json('file.json', encoding='utf-8-sig', orient='records')
Successful
When I use dask, it fails
df = dd.read_json('file.json', encoding='utf-8-sig', orient='records')
ValueError: An error occurred while calling the read_json method registered to the pandas backend. Original Message: Expected object or value
I am trying to read the data in a dask df. The original message leads me to believe it's a parse issue but could this be a bug? Does dask not have the same encoding options as pandas?
By default
dask.dataframe.read_json
will expect the raw data to be line-delimited json, this can be changed by specifyinglines=False
as a kwarg. Here's a MRE: