How do I convert the json into a dataframe and ignore the top info?

965 Views Asked by At

I'm trying to convert some Strava activity data into a df to do some analysis.

The typical top of json looks like this in all files:

{
    "frame_type": "data_message",
    "name": "record",
    "header": {
        "local_mesg_num": 3,
        "time_offset": null,
        "is_developer_data": false
    },
    "fields": [
        {
            "name": "xx",
            "value": "xx",
            "units": "xx",
            "def_num": xx,
            "raw_value": xx
        }
    ]
}

But there is a load of information above it which I am not sure how to filter out.

enter image description here

I'm trying to do df = pandas.json_normalize(data) but it returns this:

enter image description here

Essentially I want the table output to be:

timestamp position_lat position_long distance time_from_course etc
first row data more columns too

etc.

I am relatively new to this all so apologies for the noob question...

1

There are 1 best solutions below

1
On

I don't think you can accomplish what you need by directly calling a pandas function. You may have to reformat your data beforehand. The JSON module can help you manipulate your original data, treated as native Python lists and dicts, to define and populate a second "cleaned up" dictionary that you can then pass in pandas.DataFrame.

Starting from the string representation of the JSON file s, you can convert it into a list of dictionaries.

contents = json.loads(s)

Your data3 variable seems to be in this format already. It is a list of dictionaries.

If I understand correctly, you want your dataframe columns to correspond to the 'name' fields, and the rows to be populated with the 'value' fields. data3 contains many dictionaries, so each will correspond to a row. To develop working code for you, I created a silly example that I hope emulates what you have well.

s = """
[
    {
    "frame_type": "data_message",
    "name": "record",
    "header": {
        "local_mesg_num": 3,
        "time_offset": null,
        "is_developer_data": false
    },
    "fields": [
        {
            "name": "x1",
            "value": "xx",
            "units": "xx",
            "def_num": 1,
            "raw_value": 1
        }, 
        {
            "name": "x2",
            "value": "xx",
            "units": "xx",
            "def_num": 2,
            "raw_value": 2
        },
        {
            "name": "x3",
            "value": "xx",
            "units": "xx",
            "def_num": 3,
            "raw_value": 3
        }
    ]
},
   {
    "frame_type": "data_message",
    "name": "record",
    "header": {
        "local_mesg_num": 3,
        "time_offset": null,
        "is_developer_data": false
    },
    "fields": [
        {
            "name": "x1",
            "value": "xx",
            "units": "xx",
            "def_num": 1,
            "raw_value": 1
        }, 
        {
            "name": "x2",
            "value": "xx",
            "units": "xx",
            "def_num": 2,
            "raw_value": 2
        },
        {
            "name": "x3",
            "value": "xx",
            "units": "xx",
            "def_num": 3,
            "raw_value": 3
        }
    ]
}
,   {
    "frame_type": "data_message",
    "name": "record",
    "header": {
        "local_mesg_num": 3,
        "time_offset": null,
        "is_developer_data": false
    },
    "fields": [
        {
            "name": "x1",
            "value": "xx",
            "units": "xx",
            "def_num": 1,
            "raw_value": 1
        }, 
        {
            "name": "x2",
            "value": "xx",
            "units": "xx",
            "def_num": 2,
            "raw_value": 2
        },
        {
            "name": "x3",
            "value": "xx",
            "units": "xx",
            "def_num": 3,
            "raw_value": 3
        }
    ]
}]
 """
data3 = json.loads(s)  # emulating your data3 variable

Instantiate an empty dict and start filling in the things you want. This dict will be passed to pd.DataFrame().

reformatted_data = {}  # instantiate

First, use the 1st dictionary to define the columns of the dataframe.

for thing in data3[0]:
    for field in thing["fields"]:
        reformatted_data[field["name"]] = []  # instantiate columns

Then populate the rows

for thing in data3:
    for field in thing["fields"]:
        reformatted_data[field["name"]].append(field["value"])

Finally, obtain the dataframe

df = pd.DataFrame(reformatted_data)
    x1  x2  x3
0   xx  xx  xx
1   xx  xx  xx
2   xx  xx  xx