I am ingesting a JSON API. A simplified version as follows:
json_data = [
{"id": 1, "owner": None},
{"id": 2, "owner": {"email": "[email protected]"}}
]
df = pd.json_normalize(json_data, sep="_")
print(df)
id owner owner_email
0 1 NaN NaN
1 2 NaN [email protected]
In API response json_data, the field "owner" can be an object or null. When I create a pandas dataframe with json_normalize, it creates 2 columns for owner where it should have been just one owner_email column.
The simpliest solutions I have is:
To replace null owner records with dummy records before I create dataframe with json_normalize.
"owner": null --> "owner": { "email": "" }
Simply drop the useless "owner" column from the dataframe afterwards.
I was wondering if there is a more elegant solution, e.g. telling Pandas in json_normalize parameters that it shouldn't create another column in this kind of situation. As far as I read the Pandas docs, I couldn't find such option.
Any suggesions would be much appreciated.
Environment
- Debian 11
- Python 3.9.2
- Pandas 1.5.3