I have a .txt
file (23820 rows × 300 columns). It is '\t'
seperated and the decimal is ','
When reading it in with csv_read
, nearly every column in my file should be a float
but it totally messes it up. I don't get float
data (which has a dot as decimal) but string like '25,73234'
This leads to my problem when trying to convert it. See the error message:
ValueError Traceback (most recent call last)
<ipython-input-3-87b4fe6976d1> in <module>
----> 1 DF['T Ans v F_deg_C'] = [float(x) for x in DF['T Ans v F_deg_C']]
<ipython-input-3-87b4fe6976d1> in <listcomp>(.0)
----> 1 DF['T Ans v F_deg_C'] = [float(x) for x in DF['T Ans v F_deg_C']]
ValueError: could not convert string to float: '25,79243'
Here is a screenshot of my DF after import:
DF Example:
I just could skip the first 2 rows and the data is fine BUT I want to put 1rd and 2nd row elementwise together as header of my DF. I know I could seperately read the headers in and then the data. I also could change the data after importing with replacing ','
to '.'
before change string to float in each column but still I want to know what is going on there.
Now here is the question I can't answer myself. I figured out it is the 2nd line who causes my problem. When searching for answers I often read about misinterpreted '\n'
or Regex so I printed the 2nd line including every piece of format with:
with open(data_path) as f:
my_list = list(f)
my_list[1]
'yymmdd\thh:mm:ss\trpm\tNm\tkW\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\t°C\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tbar\tbar\tmbar\tmbar\tmbar\tkg/h\t%\tkg\tg/h\tkg/h\tl/min\tppm\tppm\tppm\tppm\tppm\tppm\t%\tppm\t%\tppm\tppm\tppm\t%\tppm\t%\tppm\tppm\tppm\t%\tppm\t%\tNm\tkW\tkW\tppm\tppm\tppm\tg/h\tg/h\tg/h\tg/kWh\tg/kWh\tg/kWh\t%\t%\t%\tg/h\tg/h\tg/h\tg/kWh\tg/kWh\tg/kWh\tg/h\tg/h\tg/h\tg/kWh\tg/kWh\tg/kWh\tg/h\tg/h\tg/h\tg/kWh\tg/kWh\tg/kWh\tg/kWh\tg/kWh\tg/kWh\tkg/h\tg/kg\t°C\t-\t-\t-\t-\t-\tg/kWh\tg/kWh\tbar\t-\t-\t-\t-\tkPa\tmbar\tmbar\tmg/l\tmg/l\t-\tkWh\t°C\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\tmbar\t?\t?\tcbm\t1\t\t?\t?\t\t\t\t\t\t?\t?\t?\t?\t?\t?\t?\t?\t?\t?\t?\t\t?\t?\t?\t?\t1\t1\t1\t1\t1\tppm\t\t1\tmg/Asp\t1\t\tbar\tbar\tbar\tbar\t1\t1\t1\t1\t1\t1\t1\t1\t1\t1\t1\t%\t°C\tbar\t°C\t1\t°C\t°C\t°C\t\tmbar\tmbar\tppm\t°C\tbar\t°C\tbar\tbar\tppm\t°C\t\t°C\t1/min\t1\t1\t1\t1\t1\t1\t1\t1\t1\t1\t1\t1\tkg/s\t1\t1\t1\t1\t1\t1\t1\t1\t1\t1\t1,0\t1,0\t1,0\t1,0\t1,0\t1,0\t1,0\tkW\t1,0\t1,0\t1,0\tppm\t1,0\t1,0\tg/s\t1,0\t1,0\t1,0\tppm\tppm\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\t0\n'
This 2nd row represents the units of the headers. Can somebody explain to me why this line causes my problem?
How about adding
header=[0,1]
to the function call? This specifies the first two lines in the file as the header.In your case:
pd.read_csv(data_path, delimiter='\t', decimal=',', header=[0,1])