What is the right way to read csv with dictionary in one column in Python?

71 Views Asked by At

I have a CSV file named test.csv with the following data:

,ts,snapshot
0,1686321823605,"{'asks': [['26566.1', '1.451456']], 'bids': [['26566', '4.17579913']]}"
1,1686321823705,"{'asks': [['26566.1', '1.451456']], 'bids': [['26566', '4.17579913']]}"

I am attempting to read this file using pandas and get the following result:

import pandas as pd
df = pd.read_csv(test.csv)

Unnamed: 0 ts           snapshot    
0           0.0  1686321823605  {'asks': [['26566.1', '1.451456']], 'bids': [[...
1           1.0  1686321823705  {'asks': [['26566.1', '1.451456']], 'bids': [[...

So the snapshot column contains a dictionary in each row, but I would like to split it into four columns titled asks0, asks1, bids0, and bids1, each of which contains the respective values from the rows. So the result I am looking for should be something like this:

ts,asks0, asks1, bids0, bids1
1686321823605 , 26566.1 , 1.451456 , 26566 , 4.17579913
1686321823705 , 26566.1 , 1.451456 , 26566 , 4.17579913

So far, I could not find a way of obtaining this result.

I attempted to use pd.json_normalize(), as from other discussions here it looked as if that should do the job.

df1 = pd.json_normalize(df.snapshot)
df1

but the output has n rows x zero columns.

What am I doing wrong here?

1

There are 1 best solutions below

0
justhecuke On

json_normalize() is likely failing to do what you want because it doesn't know how to handle df['snapshot'] which is a malformed JSON string. ' is not a legal JSON string delimiter. Also, I'm not sure if the function in general knows to parse a string as JSON. It may require that you already have a JSON-like object to work with.

Your data is also oddly formatted in snapshot where asks and bids are nested lists for no discernable reason. If you don't need the nesting I'd suggest removing it just to get rid of some confusion.

What you can try to do is make the new columns explicitly.

# Your string is malformed JSON. `'` is not a legal string delimiter.
df['snapshot'] = df['snapshot'].str.replace("'", '"')
# We need to translate your raw string into actual objects.
df['snapshot'] = df['snapshot'].apply(json.loads)

# From here you can do `new_df = pd.json_normalize(df['snapshot'])`,
# but I don't know how to use that to do what you want.

# Explicitly create the extra columns you want
df['asks0'] = df['snapshot'].str['asks'].str[0][0]
df['asks1'] = df['snapshot'].str['asks'].str[0][1]
df['bids0'] = df['snapshot'].str['bids'].str[0][0]
df['bids1'] = df['snapshot'].str['bids'].str[0][1]

# Get rid of the old data
df.drop('snapshot', axis=1, inplace=True)