I have a DF with a bunch of fields (calls) that contains lists of JSON information.
NOTE: That for some of these pd.json_normalize() works, but for others I have to parse out the data manually (don't know why json_normalize can't work with these. Perhaps because they are lists of JSON, versus just JSON.)
CODE
# This works on the first field, which is just JSON, not a list of JSONs
expanded_df = pd.concat([method_df,
pd.json_normalize(method_df.reportableinformation_json)],
axis=1).drop('reportableinformation_json', axis = 1)
# But the other fields, I do this...
def parseSections(row, prefix):
columns = row.index
_df = pd.DataFrame(row).transpose()
for section in row[prefix]:
_df = pd.concat([_df, pd.json_normalize(section)])
for col in columns:
_df[col].fillna(method='pad', inplace=True)
_df = _df.rename(columns={ 'ID': f'{prefix}.ID',
'Name': f'{prefix},.Name',
'Tables': f'{prefix}.Tables',
'Sections': f'{prefix}.Sections',
'Parameters': f'{prefix}.Parameters',
})
return _df
# This calls parseSections with just one row.
# Ultimately, I'd like to call this with either _df.apply() or using `for idx, row in _df.iterrow()`
parseSections(_df.iloc[0], prefix = "MethodDescription.Sections")
Original
| id | resultset_id | resultsetrevision_id | revision | name | methodtype | identifier | created | createdby | lastmodified | lastmodifiedby | audittrail_json | approvalstate | flavor | instrumenttechnique | filename | path | lastsaved | Id | Ver | MethodDescription.ID | MethodDescription.Name | MethodDescription.Tables | MethodDescription.Sections | MethodDescription.Parameters | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 44547fcf-a26e-49a1-a | 62abc4ca-bf13-4f27-b | 30cc3932-ef93-4aec-b | 2022-0722-1502-14947 | NAME hydrogen quant | pmx Method | 00000000-0000-0000-0 | 2019-03-27 22:01:27. | SYSTEM | 2022-10-12 04:17:32. | Username | [] | Generic | GC/LC Quantitative | None | NAME hydrogen quant | /project/path/ | 2022-10-14 05:25:42. | a8581436-4e34-4a17-b | 0 | None | Method | [] | [{'ID': 'Method_Info | [] |
AFTER
| id | resultset_id | resultsetrevision_id | revision | name | methodtype | identifier | created | createdby | lastmodified | lastmodifiedby | audittrail_json | approvalstate | flavor | instrumenttechnique | filename | path | lastsaved | Id | Ver | MethodDescription.ID | MethodDescription.Name | MethodDescription.Tables | MethodDescription.Sections | MethodDescription.Parameters | MethodDescription.Sections.ID | MethodDescription.Sections,.Name | MethodDescription.Sections.Tables | MethodDescription.Sections.Sections | MethodDescription.Sections.Parameters | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 44547fcf-a26e-49a1-a | 62abc4ca-bf13-4f27-b | 30cc3932-ef93-4aec-b | 2022-0722-1502-14947 | NAME hydrogen quant | pmx Method | 00000000-0000-0000-0 | 2019-03-27 22:01:27. | SYSTEM | 2022-10-12 04:17:32. | Username | [] | Generic | GC/LC Quantitative | None | NAME hydrogen quant | /project/path/ | 2022-10-14 05:25:42. | a8581436-4e34-4a17-b | 0 | None | Method | [] | [{'ID': 'Method_Info | [] | nan | nan | nan | nan | nan |
| 0 | 44547fcf-a26e-49a1-a | 62abc4ca-bf13-4f27-b | 30cc3932-ef93-4aec-b | 2022-0722-1502-14947 | NAME hydrogen quant | pmx Method | 00000000-0000-0000-0 | 2019-03-27 22:01:27. | SYSTEM | 2022-10-12 04:17:32. | Username | [] | Generic | GC/LC Quantitative | None | NAME hydrogen quant | /project/path/ | 2022-10-14 05:25:42. | a8581436-4e34-4a17-b | 0 | None | Method | [] | [{'ID': 'Method_Info | [] | Method_Information | Method Information | [] | [] | [{'ID': 'ID_Method_O |
| 0 | 44547fcf-a26e-49a1-a | 62abc4ca-bf13-4f27-b | 30cc3932-ef93-4aec-b | 2022-0722-1502-14947 | NAME hydrogen quant | pmx Method | 00000000-0000-0000-0 | 2019-03-27 22:01:27. | SYSTEM | 2022-10-12 04:17:32. | Username | [] | Generic | GC/LC Quantitative | None | NAME hydrogen quant | /project/path/ | 2022-10-14 05:25:42. | a8581436-4e34-4a17-b | 0 | None | Method | [] | [{'ID': 'Method_Info | [] | DefaultChain | Method Parameters | [] | [{'ID': 'SignalAlign | [] |
| 0 | 44547fcf-a26e-49a1-a | 62abc4ca-bf13-4f27-b | 30cc3932-ef93-4aec-b | 2022-0722-1502-14947 | NAME hydrogen quant | pmx Method | 00000000-0000-0000-0 | 2019-03-27 22:01:27. | SYSTEM | 2022-10-12 04:17:32. | Username | [] | Generic | GC/LC Quantitative | None | NAME hydrogen quant | /project/path/ | 2022-10-14 05:25:42. | a8581436-4e34-4a17-b | 0 | None | Method | [] | [{'ID': 'Method_Info | [] | Method_CalibrationCu | Calibration curves | [{'ID': 'Calibration | [] | [] |
| 0 | 44547fcf-a26e-49a1-a | 62abc4ca-bf13-4f27-b | 30cc3932-ef93-4aec-b | 2022-0722-1502-14947 | NAME hydrogen quant | pmx Method | 00000000-0000-0000-0 | 2019-03-27 22:01:27. | SYSTEM | 2022-10-12 04:17:32. | Username | [] | Generic | GC/LC Quantitative | None | NAME hydrogen quant | /project/path/ | 2022-10-14 05:25:42. | a8581436-4e34-4a17-b | 0 | None | Method | [] | [{'ID': 'Method_Info | [] | Method_ReferenceChro | Reference chromatogr | [{'ID': 'ReferenceCh | [] | [] |
With the following toy dataframe:
Here is another way to do it:
Then: