I have two csv files which i have imported to python using numpy.
the data has 2 columns:
[['month' 'total_rainfall']
['1982-01' '107.1']
['1982-02' '27.8']
['1982-03' '160.8']
['1982-04' '157']
['1982-05' '102.2']
I need to create a 2D array and calculate statistics with the 'total_rainfall' column. (Mean,Std Dev, Min and Max)
So i have this:
import numpy as np
datafile=np.genfromtxt("C:\rainfall-monthly-total.csv",delimiter=",",dtype=None,encoding=None)
print(datafile)
rainfall=np.asarray(datafile).astype(np.float32)
print (np.mean(datafile,axis=1))
ValueError: could not convert string to float: '2019-04'
Your error message reads could not convert string to float, but actually your problem is a bit different.
Your array contains string columns, which should be converted:
Unfortunately, Numpy has been created to process arrays where all cells are of the same type, so much more convenient tool is Pandas, where each column can be of its own type.
First, convert your Numpy array (I assume arr) to a pandasonic DataFrame:
I took column names from the initial row and data from following rows. Print df to see the result.
So far both columns are still of object type (actually string), so the only thing to do is to convert both columns, each to its desired type:
Now, when you run
df.info()
, you will see that both columns are of proper types.To process your data, use also Pandas. It is a more convenient tool.
E.g. to get quarterly sums, you can run:
getting (for your data sample):