Convert Json with sub fields to CSV in python

311 Views Asked by At

I have a file with sample JSON output looks as below: jsonoutput.txt file:

[{"fruit": "orange", "id":1, "countries": ["Portugal"], "color": "Orange"}

{"fruit": "apple", "id":2, "countries": ["Portugal"], "color": "red"}]

I need the output csv as (excel file):

fruit id countries color
orange 1 Portugal Orange
apple 2  Spain     red

Now, I am getting as fruit id countries color orange 1 [u'Portugal'] Orange apple 2 [u'Spain'] red

How can I remove [] , u and '' from the column countries?

print (json.dumps(fruits)) --gives me in json output

This is what I have tried to convert json to xlsx:

data= tablib.Dataset(headers=('Fruit','id','Countries','Color'))
importfile = 'jsonoutput.txt'
data.json = open(importfile. 'r').read()
data_export = data.export('xlsx')
with open('output.xlsx','wb') as f:
    f.write(data_export)
    f.close()
1

There are 1 best solutions below

1
On BEST ANSWER

you could use pandas.io.json.json_normalize

import pandas as pd
from pandas.io.json import json_normalize

d = [
    {"fruit": "orange", "id":1, "countries": ["Portugal"], "color": "Orange"},
    {"fruit": "apple", "id":2, "countries": ["Portugal"], "color": "red"}
]

df = pd.concat([json_normalize(d[i]) for i in range(len(d))], ignore_index=True)
df['countries'] = df['countries'].str.join(' ')

    fruit   id  countries   color
0   orange  1   Portugal    Orange
1   apple   2   Portugal    red

to save it as .xlsx file, use:

df.to_excel('filename.xlsx', index=False)

EDIT:

json_normalize is a function that normalizes semi-structured JSON data into a flat table.

I actually realised now that my code could be simplified to:

df = json_normalize(d) # no need for `pd.concat`

### Output:
#   fruit   id  countries   color
# 0 orange  1   ['Portugal']    Orange
# 1 apple   2   ['Portugal']    red

To remove [] from countries column, I used pandas.Series.str.join, which is pandas' equivalent to Python's str.join.

It's needed since initially countries column is a list containing elements

df['countries'] = df['countries'].str.join(' ')

countries column is no longer a list once you join items:

    fruit   id  countries   color
0   orange  1   Portugal    Orange
1   apple   2   Portugal    red