I'm trying to convert some Strava activity data into a df to do some analysis.
The typical top of json looks like this in all files:
{
"frame_type": "data_message",
"name": "record",
"header": {
"local_mesg_num": 3,
"time_offset": null,
"is_developer_data": false
},
"fields": [
{
"name": "xx",
"value": "xx",
"units": "xx",
"def_num": xx,
"raw_value": xx
}
]
}
But there is a load of information above it which I am not sure how to filter out.
I'm trying to do
df = pandas.json_normalize(data)
but it returns this:
Essentially I want the table output to be:
timestamp | position_lat | position_long | distance | time_from_course | etc |
---|---|---|---|---|---|
first | row | data | more | columns too |
etc.
I am relatively new to this all so apologies for the noob question...
I don't think you can accomplish what you need by directly calling a pandas function. You may have to reformat your data beforehand. The JSON module can help you manipulate your original data, treated as native Python lists and dicts, to define and populate a second "cleaned up" dictionary that you can then pass in
pandas.DataFrame
.Starting from the string representation of the JSON file
s
, you can convert it into a list of dictionaries.Your
data3
variable seems to be in this format already. It is a list of dictionaries.If I understand correctly, you want your dataframe columns to correspond to the 'name' fields, and the rows to be populated with the 'value' fields.
data3
contains many dictionaries, so each will correspond to a row. To develop working code for you, I created a silly example that I hope emulates what you have well.Instantiate an empty dict and start filling in the things you want. This dict will be passed to
pd.DataFrame()
.First, use the 1st dictionary to define the columns of the dataframe.
Then populate the rows
Finally, obtain the dataframe