Python - convert from JSON to CSV

56 Views Asked by At

I am trying to achieve a conversion of some JSON data to CSV for a later analysis.

Issue is due to the n-nested levels, I think, and mainly for the last stage of the conversion process.

So far, I have been able to create a proper data rows mapping, but it fails once I try to include the respective data into each single rows for x'000s of times.

I downloaded the data from a website and use it from an offline file in my desktop, but the JSON data comes from: https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/HSA06/JSON-stat/1.0/en

It comes from Irish Data Department: https://data.gov.ie/dataset/hsa06-average-price-of-houses

Last code I have is:

import json
import csv
 
local_file_path = "/Users/piuno/OneDrive - Asociación Club Taller Cultural/full_data_housing.json"

# Load JSON data from file
with open(local_file_path, 'r') as json_file:
    data = json.load(json_file)
 
# Writing to csv
with open('output.csv', mode='w', newline='') as file:
    writer = csv.writer(file)
    # Write header
    header = ['Statistic', 'Quarter', 'Area', 'Value']
    writer.writerow(header)
    # Write data rows
    statistic_mapping = data['dataset']['dimension']['STATISTIC']['category']['label']
    quarter_mapping = data['dataset']['dimension']['TLIST(Q1)']['category']['label']
    area_mapping = data['dataset']['dimension']['C02343V02817']['category']['label']
    values = data['dataset']['value']

    print("Statistical Mapping:", statistic_mapping)
    print("Quarter Mapping:", quarter_mapping)
    print("Area Mapping:", area_mapping)

    for i in range(len(values)):
        if values[i] is not None:
            statistic_index = i // (len(data['dataset']['dimension']['TLIST(Q1)']['category']['index']) * len(data['dataset']['dimension']['C02343V02817']['category']['index']))
            quarter_index = (i % (len(data['dataset']['dimension']['TLIST(Q1)']['category']['index']) * len(data['dataset']['dimension']['C02343V02817']['category']['index']))) // len(data['dataset']['dimension']['C02343V02817']['category']['index'])
            area_index = i % len(data['dataset']['dimension']['C02343V02817']['category']['index'])
 
            row = [statistic_mapping[str(statistic_index)], quarter_mapping[str(quarter_index)], area_mapping[str(area_index)], values[i]]
            writer.writerow(row)

With that, it returns properly the prints as it shows in the image below:

Current result

The expected result would be something like:

Expected result

I don't know what's failing or missing in the loop to look up for the specific value and then return the specific data as shown in the second image.

1

There are 1 best solutions below

0
Ivannpy On

You are trying to unfold the JSON file. Try transforming the json to a tree and then transversing the tree in the correct way to obtain each row.