Converting csv to nested Json using python

114 Views Asked by At

I want to convert csv file to json file. I have large data in csv file.

CSV Column Structure This is my column structure in csv file . I has 200+ records.

id.oid  libId  personalinfo.Name personalinfo.Roll_NO personalinfo.addr personalinfo.marks.maths personalinfo.marks.physic clginfo.clgName clginfo.clgAddr clginfo.haveCert clginfo.certNo clginfo.certificates.cert_name_1 clginfo.certificates.cert_no_1 clginfo.certificates.cert_exp_1 clginfo.certificates.cert_name_2 clginfo.certificates.cert_no_2 clginfo.certificates.cert_exp_2 clginfo.isDept clginfo.NoofDept clginfo.DeptDetails.DeptName_1 clginfo.DeptDetails.location_1 clginfo.DeptDetails.establish_date_1  _v  updatedAt.date

Expected Json

[{
"id":
{
    "$oid": "00001"
},
"libId":11111,
"personalinfo":
{
    "Name":"xyz",
    "Roll_NO":101,
    "addr":"aa bb cc ddd",
    "marks":
    [
        "maths":80,
        "physic":90
        .....
    
    ]
},
"clginfo"
{
    "clgName":"pqr",
    "clgAddr":"qwerty",
    "haveCert":true,  //this is boolean true or false
    "certNo":1,      //this could be 1-10
    "certificates":
    [
        {
            "cert_name_1":"xxx",
            "cert_no_1":12345,
            "cert_exp.1":"20/2/20202"
        
        },
        {
            "cert_name_2":"xxx",
            "cert_no_2":12345,
            "cert_exp_2":"20/2/20202"
        
        },
        ......//could be up to 10
    ],
    "isDept":true,  //this is boolean true or false
    "NoofDept":1 ,  //this could be 1-10
    "DeptDetails":
    [
        {
            "DeptName_1":"yyy",
            "location_1":"zzz",
            "establish_date_1":"1/1/1919"
        },
        ......//up to 10 records
    ]
},
"__v": 1,
"updatedAt": 
{
    "$date": "2022-02-02T13:35:59.843Z"
}
}]

I have tried using pandas but I'm getting output as

My output

[{
"id.$oid": "00001",
"libId":11111,
"personalinfo.Name":"xyz",
"personalinfo.Roll_NO":101,
"personalinfo.addr":"aa bb cc ddd",
"personalinfo.marks.maths":80,
"personalinfo.marks.physic":90,
"clginfo.clgName":"pqr",
"clginfo.clgAddr":"qwerty",
"clginfo.haveCert":true,  
"clginfo.certNo":1,     
"clginfo.certificates.cert_name_1":"xxx",
"clginfo.certificates.cert_no_1":12345,
"clginfo.certificates.cert_exp.1":"20/2/20202"
"clginfo.certificates.cert_name_2":"xxx",
"clginfo.certificates.cert_no_2":12345,
"clginfo.certificates.cert_exp_2":"20/2/20202"
"clginfo.isDept":true,  
"clginfo.NoofDept":1 ,  
"clginfo.DeptDetails.DeptName_1":"yyy",
"clginfo.DeptDetails.location_1":"zzz",
"eclginfo.DeptDetails.stablish_date_1":"1/1/1919",
"__v": 1,
"updatedAt.$date": "2022-02-02T13:35:59.843Z",
}]

I am new to python I only know the basic Please help me getting this output.

1

There are 1 best solutions below

1
Yaroslav Fyodorov On
  1. 200+ records is really tiny, so even naive solution is good.
  2. It can't be totally generic because I don't see how it can be seen from the headers that certificates is a list, unless we rely on all names under certificates having _N at the end.

Proposed solution using only basic python: read header row - split all column names by period. Iterate over resulting list and create nested dicts with appropriate keys and dummy values (if you want to handle lists: create array if current key ends with _N and use N as an index) for all rows: clone dictionary with dummy values for each column use split keys from above to put the value into the corresponding dict. same solution from above for lists. append the dictionary to list of rows