Performing pearsonr() correlation from scipy ignoring NaN

1.1k Views Asked by At

I want to correlate df1 and df2, which have all the same column names Length date(i). Both data.frames have cells with NaN. I would like to ignore these cells and just skip this row in both data.frames.

measurements = 10
for i in range(1,measurements+1):
    cor_len = pearsonr(df1['Length date(' + str(i) + ')'], df2['Length date(' + str(i) + ')'])

So far I always get:

ValueError: array must not contain infs or NaNs

Both df look like:

Length date(1)  Length date(2)  Length date(3)  Length date(4)  Length date(5)  Length date(6)
8.512326        5.758995       39.743087       52.811855       75.285461       65.122357
18.852476       56.067361       93.113099      177.415235      184.472485      161.042771
92.391779       53.909429       76.507877      149.716421      147.524874      110.94987
NaN             21.220149        NaN             NaN             NaN             NaN
NaN             27.559950        NaN             NaN             NaN             NaN
1

There are 1 best solutions below

0
On

This solutions works out fine:

for i in range(1,measurements+1):
    xl  = dfr['Length date(' + str(i) + ')'].dropna()
    yl = df['Length date(' + str(i) + ')'].dropna()
    dl = pd.concat([xl,yl], axis = 1)
    dl = dl.dropna()
    cor_len = dl.corr()