I have a CSV file named test.csv with the following data:
,ts,snapshot
0,1686321823605,"{'asks': [['26566.1', '1.451456']], 'bids': [['26566', '4.17579913']]}"
1,1686321823705,"{'asks': [['26566.1', '1.451456']], 'bids': [['26566', '4.17579913']]}"
I am attempting to read this file using pandas and get the following result:
import pandas as pd
df = pd.read_csv(test.csv)
Unnamed: 0 ts snapshot
0 0.0 1686321823605 {'asks': [['26566.1', '1.451456']], 'bids': [[...
1 1.0 1686321823705 {'asks': [['26566.1', '1.451456']], 'bids': [[...
So the snapshot column contains a dictionary in each row, but I would like to split it into four columns titled asks0, asks1, bids0, and bids1, each of which contains the respective values from the rows. So the result I am looking for should be something like this:
ts,asks0, asks1, bids0, bids1
1686321823605 , 26566.1 , 1.451456 , 26566 , 4.17579913
1686321823705 , 26566.1 , 1.451456 , 26566 , 4.17579913
So far, I could not find a way of obtaining this result.
I attempted to use pd.json_normalize(), as from other discussions here it looked as if that should do the job.
df1 = pd.json_normalize(df.snapshot)
df1
but the output has n rows x zero columns.
What am I doing wrong here?
json_normalize()is likely failing to do what you want because it doesn't know how to handledf['snapshot']which is a malformed JSON string.'is not a legal JSON string delimiter. Also, I'm not sure if the function in general knows to parse a string as JSON. It may require that you already have a JSON-like object to work with.Your data is also oddly formatted in snapshot where
asksandbidsare nested lists for no discernable reason. If you don't need the nesting I'd suggest removing it just to get rid of some confusion.What you can try to do is make the new columns explicitly.