Convert Pandas DF to Nested JSON

199 Views Asked by Sledge At 01 July 2025 at 19:19

I have a dataframe in pandas of the following form:

# df name: cust_sim_data_product_agg:
yearmo  region  products    revenue
0   201711  CN  ['Auto', 'Flood', 'Home', 'Liability', 'Life',...   690
1   201711  CN  ['Auto', 'Flood', 'Home', 'Liability', 'Life']  610
2   201711  CN  ['Auto', 'Flood', 'Home', 'Liability']  560
3   201711  CN  ['Auto', 'Flood', 'Home', 'Life', 'Liability',...   690
4   201711  CN  ['Auto', 'Flood', 'Home', 'Life', 'Mortgage', ...   690

I would like to roll it up into a nested json of the form:

{
  yearmo: '201711'
  data: [
    {
      name: 'SE',
      value: 18090, # sum of all the values in the level below
      children: [
        {
          name: '['Auto', 'Flood', 'Home',...], # this is product from the dataframe
          value: 690 . # this is the revenue value
        },
        {
          name: '['Flood', 'Home', 'Life'...],
          value: 690
        },
        ...
      },
      {
      name: 'NE',
      value: 16500, # sum of all the values in the level below
      children: [
        {
          name: '['Auto', 'Home',...],
          value: 210
        },
        {
          name: '['Life'...],
          value: 450
        },
        ...
      }
    },
  yearmo: '201712'
  data: [
    {
      name: 'SE',
      value: 24050,
      children: [ ... ] # same format as above
    },
    {
      name: 'NE',
      value: 22400,
      children: [ ... ] # same format as above
    }
  ]
}

So each yearmo would have an element in the top level of the json. Within data, there would be an entry for each region where value is the sum of the values from the level directly underneath it. Children is an array of dicts where each dict a map of product -> name and revenue -> value from the row level data in the pandas DF.

My best attempt so far looks like this:

def roll_yearmo_rev(d):
    x1 = [{'name': n, 'value': v}  for n,v in zip(d.products, d.revenue)]
    x2 = {'children': x1, 'value': sum(d.revenue)}
    return x2

def roll_yearmo(d):
    x1 = [{'name': n, 'children': c} for n,c in zip(d.region, d.children)]
    x2 = {'children': x1, 'value': sum(d.value)}
    return x2

cust_sim_data_product_agg_dict = cust_sim_data_product_agg.groupby(['yearmo', 'region'])\
    .apply(roll_yearmo_rev)
cust_sim_data_product_agg_dict = cust_sim_data_product_agg_dict.reset_index()
cust_sim_data_product_agg_dict.columns = ['yearmo' , 'region', 'children']


cust_sim_data_product_agg_dict = cust_sim_data_product_agg_dict.groupby(['yearmo'])\
    .apply(roll_yearmo)
cust_sim_data_product_agg_dict = cust_sim_data_product_agg_dict.reset_index()

Which fails because the last rollup throws the following error:

AttributeError: 'DataFrame' object has no attribute 'value'

The whole thing looks messy to me. I read up on split-apply-combine which inspired the use of groupby() and apply() but I could really use a second opinion on the approach because I am pretty sure there is a better way. Any advice would be appreciated.

Original Q&A

Convert Pandas DF to Nested JSON

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SPLIT-APPLY-COMBINE

Trending Questions

Popular # Hahtags

Popular Questions