How do i convert one column from an imported csv using numpy from string to float?

527 Views Asked by At

I have two csv files which i have imported to python using numpy.
the data has 2 columns:

[['month' 'total_rainfall']        
 ['1982-01' '107.1']    
 ['1982-02' '27.8']    
 ['1982-03' '160.8']    
 ['1982-04' '157']    
 ['1982-05' '102.2']   

I need to create a 2D array and calculate statistics with the 'total_rainfall' column. (Mean,Std Dev, Min and Max)

So i have this:

import numpy as np    
datafile=np.genfromtxt("C:\rainfall-monthly-total.csv",delimiter=",",dtype=None,encoding=None)    
print(datafile)    
rainfall=np.asarray(datafile).astype(np.float32)    
print (np.mean(datafile,axis=1)) 

ValueError: could not convert string to float: '2019-04'

3

There are 3 best solutions below

0
On

Your error message reads could not convert string to float, but actually your problem is a bit different.

Your array contains string columns, which should be converted:

  • month - to Period (month),
  • total_rainfall - to float.

Unfortunately, Numpy has been created to process arrays where all cells are of the same type, so much more convenient tool is Pandas, where each column can be of its own type.

First, convert your Numpy array (I assume arr) to a pandasonic DataFrame:

import pandas as pd

df = pd.DataFrame(arr[1:], columns=arr[0])

I took column names from the initial row and data from following rows. Print df to see the result.

So far both columns are still of object type (actually string), so the only thing to do is to convert both columns, each to its desired type:

df.month = pd.PeriodIndex(df.month, freq='M')
df.total_rainfall = df.total_rainfall.astype(float)

Now, when you run df.info(), you will see that both columns are of proper types.

To process your data, use also Pandas. It is a more convenient tool.

E.g. to get quarterly sums, you can run:

df.set_index('month').resample('Q').sum()

getting (for your data sample):

        total_rainfall
month                 
1982Q1           295.7
1982Q2           259.2
0
On

Converting str to float is like below:

>>> a = "545.2222"
>>> float(a)
545.22220000000004
>>> int(float(a))
545

but the error message says the problem is converting 2019-04 to float.

when you want to convert 2019-04 to float it doesn't work because float numbers don't have - in between . That is why you got error.

0
On

You can convert values of rainfall into float or int but date can't be converted. To convert date into int you have to split the string and combine it back as date formate then convert it to milliseconds as:

from datetime import datetime

month1 = '1982-01' 
date = datetime(month1.split('-')[0], month1.split('-')[1], 1)
milliseconds = int(round(date.timestamp() * 1000))

This way, you assume its first date of the month.