I want to replace Nan values with row averages but first column( first value in the row) is a string

37 Views Asked by At

I want to replace Nans with the averages of the rows but the first value of the row is a string and therefore python can't calculate the row averages( axis=1) . How can I replace Nans with row averages and keep the first value as the name for that row?

Index    Country   1990 1995 2000 2005 2010 2015 
 0         US       5    6    Nan  9    19   11
 1        Germany   5    Nan  3    7    19    9
 .         ...      ..    ..   ..  ..   ..   ..
 ,         ...      ..    ..   ..  ..   ..   ..

I use something like this df.fillna(df.drop['country'] , axis=1).apply(lambda x: x.mean() ,axis=1). It doesn't replace Nans as fillna is not working. I don't have a problem replacing Nans with axis=0.

1

There are 1 best solutions below

1
Grismar On BEST ANSWER

If data.csv would be like this:

x,y,a,b,c,d
0,"A",1,2,3,4
1,"B",5,6,7,
2,"C",9,,11,

It looks like you were after something like this:

from pandas import read_csv

df = read_csv('data.csv')
print(df)

df = df.apply(lambda x: x.fillna(x[2:].mean()), axis=1)
print(df)

Output:

   x  y  a    b   c    d
0  0  A  1  2.0   3  4.0
1  1  B  5  6.0   7  NaN
2  2  C  9  NaN  11  NaN
   x  y  a     b   c     d
0  0  A  1   2.0   3   4.0
1  1  B  5   6.0   7   6.0
2  2  C  9  10.0  11  10.0

It works because of the way the lambda ignores the first two columns:

x.fillna(x[2:].mean())

And it is applied row by row, so along axis 1.

Note: Pandas decides to make the columns that have NaN floating point data and keeps the other integer type. Of course there's ways to fix that, but I decided to keep the answer simple.