How to write .csv that includes column containing text (and special characters) with pandas

46 Views Asked by At

I am trying to clean a very large .csv file with pandas. The .csv has a column that contains text including characters like ',' or '/'. Therefore when I read the file, I specify escapechar='\\' and I observe that the file is successfully read and has the correct shape. After cleaning, I re-write the file in another path. However, this cleaned file has a different shape than the original which makes no sense. I assume it's because of this text column. I also tried to specify escapechar='\\' when I write it but still its shape is wrong and it mixes up the columns. What should I parse to the pd.to_csv method to write the file as it was in its original format? My code is below:

> `reader = pd.read_csv(local_file_path, nrows=None,chunksize=10000,
> escapechar='\\')
>     output_path = f'/home/achilleslaststand/Desktop/clean_data/{report}_cleaned.csv'

j = 0
for chunk in reader:

    # Iterate over columns
    for column in chunk.columns:
        # Check if the column is in keys of the dictionaries
        if column in median_values and column in max_tresholds:
            # Check if values exceed max threshold
            mask = chunk[column] > max_tresholds[column]
            # Replace values exceeding max threshold with median value
            chunk.loc[mask, column] = median_values[column]

    if j==0:
        chunk.to_csv(output_path, mode='a', header=True, index=False)
    else:
        chunk.to_csv(output_path, mode='a', header=False, index=False)
    j += 1`

enter image description here

enter image description here

0

There are 0 best solutions below