I am trying to clean a very large .csv file with pandas. The .csv has a column that contains text including characters like ',' or '/'. Therefore when I read the file, I specify escapechar='\\'
and I observe that the file is successfully read and has the correct shape. After cleaning, I re-write the file in another path. However, this cleaned file has a different shape than the original which makes no sense. I assume it's because of this text column. I also tried to specify escapechar='\\' when I write it but still its shape is wrong and it mixes up the columns. What should I parse to the pd.to_csv method to write the file as it was in its original format? My code is below:
> `reader = pd.read_csv(local_file_path, nrows=None,chunksize=10000,
> escapechar='\\')
> output_path = f'/home/achilleslaststand/Desktop/clean_data/{report}_cleaned.csv'
j = 0
for chunk in reader:
# Iterate over columns
for column in chunk.columns:
# Check if the column is in keys of the dictionaries
if column in median_values and column in max_tresholds:
# Check if values exceed max threshold
mask = chunk[column] > max_tresholds[column]
# Replace values exceeding max threshold with median value
chunk.loc[mask, column] = median_values[column]
if j==0:
chunk.to_csv(output_path, mode='a', header=True, index=False)
else:
chunk.to_csv(output_path, mode='a', header=False, index=False)
j += 1`