Pandas json_normalize nullable parent keys

71 Views Asked by At

I am ingesting a JSON API. A simplified version as follows:

json_data = [
    {"id": 1, "owner": None}, 
    {"id": 2, "owner": {"email": "[email protected]"}}
]

df = pd.json_normalize(json_data, sep="_")
print(df)

   id  owner  owner_email
0   1    NaN          NaN
1   2    NaN  [email protected]

In API response json_data, the field "owner" can be an object or null. When I create a pandas dataframe with json_normalize, it creates 2 columns for owner where it should have been just one owner_email column.

The simpliest solutions I have is:

  1. To replace null owner records with dummy records before I create dataframe with json_normalize.

    "owner": null --> "owner": { "email": "" }

  2. Simply drop the useless "owner" column from the dataframe afterwards.

I was wondering if there is a more elegant solution, e.g. telling Pandas in json_normalize parameters that it shouldn't create another column in this kind of situation. As far as I read the Pandas docs, I couldn't find such option.

Any suggesions would be much appreciated.

Environment

  • Debian 11
  • Python 3.9.2
  • Pandas 1.5.3
0

There are 0 best solutions below