decimal parameter using pandas.read_csv

65 Views Asked by At

I have a CSV with 19 columns. Many of the values are decimals formatted like '56,2' [german/eureopean format]. No big issue, I thought using the 'decimal=' parameter:

self.imported_csv = pd.read_csv(full_path, encoding='utf_16_le', sep=';', on_bad_lines='warn', decimal=',')

Works fine - except for exactly ONE column. So every other column is read correctly (floats), but the ONE columns keeps the ',' as decimal seperator. Any idea why this is possible?

Not knowing why, I tried following workaround:

self.imported_csv['Luftdruck (hPa)'] = self.imported_csv['Luftdruck (hPa)'].astype(float)

... unfortunately I already expected the result: could not convert string to float: '979,5'.

Any idea what problem I am facing here? Thanks in advance.

Here is one example row of the CSV:

one example row

2

There are 2 best solutions below

2
TheHungryCub On BEST ANSWER

Before reading the CSV, you might consider cleaning the problematic column to remove any non-numeric characters or inconsistencies.

self.imported_csv['Luftdruck (hPa)'] = self.imported_csv['Luftdruck (hPa)'].str.replace(',', '.').astype(float)

Try explicitly specifying the locale using the locale parameter in the read_csv function:

self.imported_csv = pd.read_csv(full_path, encoding='utf_16_le', sep=';', on_bad_lines='warn', decimal=',', thousands='.', locale='de_DE')

Also please ensure below points :

  1. Look for any non-numeric characters in the column. There might be invisible characters or non-breaking spaces that are causing the issue.
  2. Look for any non-numeric characters in the column. There might be invisible characters or non-breaking spaces that are causing the issue.
0
Dominic Reeves On

Quite possible there's some string values in that column, I'd check for things like "NAN" or something that wouldn't otherwise be parsed.

I would try the following and see if that works:

imported_csv["Luftdruck (hPa)"] = imported_csv["Luftdruck (hPa)"].str.replace(",",".").astype(float)

Otherwise check the column for non-parseable values like actual strings.