.csv to .arff function on Python

1k Views Asked by At

I'm trying to do a convertion function from csv to arff, right now I have this:

def csv2arff(csv_path, arff_path=None):
    with open(csv_path, 'r') as fr:
        attributes = []
        
        if arff_path is None:
            arff_path = csv_path[:-4] + '_prueba.arff'  # *.arff -> *.csv
            
        write_sw = False
        with open(arff_path, 'w') as fw:
            fw.write('@relation base_datos_modelo_3_limpia \n')
            firstline = fr.readlines()[0].rstrip()
            fw.write(firstline)

and that gives me:

@relation base_datos_modelo_3_limpia

DVJ_Valgus_KneeMedialDisplacement_D_discr,BMI,AgeGroup,ROM-PADF-KE_D,DVJ_Valgus_FPPA_D_discr,TrainFrequency,DVJ_Valgus_FPPA_ND_discr,Asym_SLCMJLanding-pVGRF(10percent)_discr,Asym-ROM-PHIR(≥8)_discr,Asym_TJ_Valgus_FPPA(10percent)_discr,TJ_Valgus_FPPA_ND_discr,Asym-ROM-PHF-KE(≥8)_discr,TJ_Valgus_FPPA_D_discr,Asym_SLCMJ-Height(10percent)_discr,Asym_YBTpl(10percent)_discr,Position,Asym-ROM-PADF-KE(≥8º)_discr,DVJ_Valgus_KneeMedialDisplacement_ND_discr,DVJ_Valgus_Knee-to-ankle-ratio_discr,Asym-ROM-PKF(≥8)_discr,Asym-ROM-PHABD(≥8)_discr,Asym-ROM-PHF-KF(≥8)_discr,Asym-ROM-PHER(≥8)_discr,AsymYBTanterior10percentdiscr,Asym-ROM-PHABD-HF(≥8)_discr,Asym-ROM-PHE(≥8)_discr,Asym(>4cm)-DVJ_Valgus_Knee;edialDisplacement_discr,Asym_SLCMJTakeOff-pVGRF(10percent)_discr,Asym-ROM-PHADD(≥8)_discr,Asym-YBTcomposite(10percent)_discr,Asym_SingleHop(10percent)_discr,Asym_YBTpm(10percent)_discr,Asym_DVJ_Valgus_FPPA(10percent)_discr,Asym_SLCMJ-pLFT(10percent)_discr,DominantLeg,Asym-ROM-PADF-KF(≥8)_discr,ROM-PHER_ND,CPRDmentalskills,POMStension,STAI-R,ROM-PHER_D,ROM-PHIR_D,ROM-PADF-KF_ND,ROM-PADF-KF_D,Age_at_PHV,ROM-PHIR_ND,CPRDtcohesion,Eperience,ROM-PHABD-HF_D,MaturityOffset,Weight,ROM-PHADD_ND,Height,ROM-PHADD_D,Age,POMSdepressio,ROM-PADF-KE_ND,POMSanger,YBTanterior_Dnorm,YBTanterior_NDnorm,POMSvigour,Soft-Tissue_injury_≥4days

So i want to put "@attribute" before each attribute and change the "," to "\n". But don't know how to do it, I tried to make a function to change the "," but didn't work, any idea?

Thank you guys.

1

There are 1 best solutions below

0
fracpete On

Try the liac-arff library.

Here is an example for converting the UCI iris dataset from ARFF to CSV and then back to ARFF:

import csv
import arff

# arff -> csv
content = arff.load(open('./iris.arff', 'r'))
with open('./out.csv', 'w') as fp:
    writer = csv.writer(fp)
    header = []
    for n, t in content['attributes']:
        header.append(n)
    writer.writerow(header)
    writer.writerows(content['data'])

# csv -> arff
with open('./out.csv', 'r') as fp:
    reader = csv.reader(fp)
    header = None
    data = []
    for row in reader:
        if header is None:
            header = row
        else:
            data.append(row)

content = {}
content['relation'] = "from my csv file"
content['attributes'] = []
for n in header:
    if n == "class":
        content['attributes'].append((n, ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']))
    else:
        content['attributes'].append((n, 'NUMERIC'))
content['data'] = data
with open('./out.arff', 'w') as fp:
    arff.dump(content, fp)

NB: For the last stage, we need to specify the nominal class values, which you could determine by scanning the data.