Given the sample CSV data below, in a pandas DataFrame, how can I output to_json as follows
as_of
category
type
subtype
log: [
#sorted by timestamp
{timestamp: 1618879229, action: add, stale_timestamp: true},
{timestamp: 1608879229, action: remove, stale_timestamp: None},
]
20210415
apples
A
big
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
]
small
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
{timestamp: 1608879229, action: remove, stale_timestamp: None},
{timestamp: 1518879229, action: add, stale_timestamp: None},
]
B
big
log: [
{timestamp: 1618879229, action: add, stale_timestamp: None},
]
Bonus if you can also help me go back to a DataFrame from the nested json!
as_of | category | type | sub_type | action | timestamp | stale_timestamp |
---|---|---|---|---|---|---|
20210415 | apples | A | big | add | 1618879229.6703315 | |
20210415 | apples | A | small | add | 1618879229.6703315 | |
20210415 | apples | B | big | add | 1618879229.6703315 | |
20210415 | apples | B | small | add | 1618879229.6703315 | |
20210415 | apples | C | big | add | 1618879229.6703315 | |
20210415 | apples | C | small | add | 1618879229.6703315 | |
202103 | oranges | sweet | add | 1616892142.6703315 | ||
202103 | oranges | sweet | remove | 1616632942.6703315 | ||
202103 | oranges | sweet | add | 1616200942.6703315 | ||
202103 | grapes | sweet | add | 1616200942.6703315 | ||
202102 | oranges | sweet | add | 1616200942.6703315 | ||
202102 | grapes | sweet | add | 1616200942.6703315 | ||
20210115 | apples | A | big | add | 1611103342.6703315 | |
20210115 | apples | A | small | add | 1611103342.6703315 | |
20210115 | apples | B | big | add | 1611103342.6703315 | |
20210115 | apples | B | small | add | 1611103342.6703315 | |
20210115 | apples | C | big | add | 1611103342.6703315 | |
20210115 | apples | C | small | add | 1611103342.6703315 | |
202101 | oranges | sweet | add | 1608424942.6703315 | ||
202101 | grapes | sweet | add | 1608424942.6703315 | ||
202012 | oranges | sweet | add | 1608424942.6703315 | ||
202012 | grapes | sweet | add | 1608424942.6703315 | ||
202011 | oranges | sweet | add | 1608424942.6703315 | ||
202011 | grapes | sweet | add | 1608424942.6703315 | ||
20201015 | apples | A | big | add | 1608424942.6703315 | True |
20201015 | apples | A | small | add | 1608424942.6703315 | True |
20201015 | apples | B | big | add | 1608424942.6703315 | True |
20201015 | apples | B | small | add | 1608424942.6703315 | True |
20201015 | apples | C | big | add | 1608424942.6703315 | True |
20201015 | apples | C | small | add | 1608424942.6703315 | True |
202010 | oranges | sweet | add | 1608424942.6703315 | True | |
202010 | grapes | sweet | add | 1608424942.6703315 | True |
First I converted the table into a CSV:
The missing entries cause problems later in the JSON. That either needs to be fixed in the input file or in the Python converter. Also, some of the dates seem to be missing characters.
Because there isn't a pre-defined
orient
option in Pandas that suits this requirement, I wrote a custom dictionary and then converted the dictionary to JSON.The sample output provided by the asker isn't consistent with what actually gets produced by the Python implementation.