Merge similar columns or not to create the multiple columns with similar column names while flattening JSON

51 Views Asked by At

When flatten JSON, i am getting columns with values as:

{'0_extension_0_url': 'http://hl7.org/fhir/StructureDefinition/geolocation', '0_extension_0_extension_0_url': 'latitude', '0_extension_0_extension_0_valueDecimal': 42.06768934464684, '0_extension_0_extension_1_url': 'longitude', '0_extension_0_extension_1_valueDecimal': -71.17560251863814, '0_line_0': 'ratna', '0_city': 'Sharon', '0_state': 'MA', '0_postalCode': '02067', '0_country': 'US'}

desired output in CSV format:

extension_url, extension_extension_url, extension_extension_valueDecimal_latitude, extension_extension_valueDecimal_longitude, line, city, state, postalCode, country
http://hl7.org/fhir/StructureDefinition/geolocation,42.06768934464684, -71.17560251863814, ratna, Sharon, MA, 02067, US

Below JSON data is a part of JSON data:

      "address": [ {
        "extension": [ {
          "url": "http://hl7.org/fhir/StructureDefinition/geolocation",
          "extension": [ {
            "url": "latitude",
            "valueDecimal": 42.06768934464684
          }, {
            "url": "longitude",
            "valueDecimal": -71.17560251863814
          } ]
        } ],
        "line": [ "350 Frami Trafficway" ],
        "city": "Sharon",
        "state": "MA",
        "postalCode": "02067",
        "country": "US"
      } ],

python code:

def flatten_json(y):
    out = {}
    
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            
            out[name[:-1]] = x

    flatten(y)
    return out

I am expecting to create only single field name, without JSON location keys and group the values (if more than one) under the given field name, to create tabular form of data.

1

There are 1 best solutions below

0
leaf_yakitori On
  • you can remove the index code, add break the list logic code to it, depends on what you real need.
  • base on the desired output example code.
def flatten_json(y):
    out = {}
    
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            if len(x)>1:
                for a in x:
                    out[name + list(a.values())[0]] = list(a.values())[1]
            else:
                for a in x:
                    flatten(a, name)
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

import json
import csv

f = open('test.json')
data = json.load(f)
data = flatten_json(data)
print(data)
f.close()
with open('mycsvfile.csv', 'w') as f:
    w = csv.DictWriter(f, data.keys())
    w.writeheader()
    w.writerow(data)
  • result:
{'address_extension_url': 'http://hl7.org/fhir/StructureDefinition/geolocation', 'address_extension_extension_latitude': 42.06768934464684, 'address_extension_extension_longitude': -71.17560251863814, 'address_line': '350 Frami Trafficway', 'address_city': 'Sharon', 'address_state': 'MA', 'address_postalCode': '02067', 'address_country': 'US'}
  • csv result:
address_extension_url,address_extension_extension_latitude,address_extension_extension_longitude,address_line,address_city,address_state,address_postalCode,address_country
http://hl7.org/fhir/StructureDefinition/geolocation,42.06768934464684,-71.17560251863814,350 Frami Trafficway,Sharon,MA,02067,US