Python pandas dataframe apply result of function to multiple columns where NaN

246 Views Asked by At

I have a dataframe with three columns and a function that calculates the values of column y and z given the value of column x. I need to only calculate the values if they are missing NaN.

def calculate(x):
    return 1, 2

df = pd.DataFrame({'x':['a', 'b', 'c', 'd', 'e', 'f'], 'y':[np.NaN, np.NaN, np.NaN, 'a1', 'b2', 'c3'], 'z':[np.NaN, np.NaN, np.NaN, 'a2', 'b1', 'c4']})

 x    y    z
0  a  NaN  NaN
1  b  NaN  NaN
2  c  NaN  NaN
3  d   a1   a2
4  e   b2   b1
5  f   c3   c4

mask = (df.isnull().any(axis=1))

df[['y', 'z']] = df[mask].apply(calculate, axis=1, result_type='expand')

However, I get the following result, although I only apply to the masked set. Unsure what I'm doing wrong.

    x   y   z
0   a   1.0 2.0
1   b   1.0 2.0
2   c   1.0 2.0
3   d   NaN NaN
4   e   NaN NaN
5   f   NaN NaN

If the mask is inverted I get the following result:

df[['y', 'z']] = df[~mask].apply(calculate, axis=1, result_type='expand')
    x   y   z
0   a   NaN NaN
1   b   NaN NaN
2   c   NaN NaN
3   d   1.0 2.0
4   e   1.0 2.0
5   f   1.0 2.0

Expected result:

   x    y    z
0  a  1.0   2.0
1  b  1.0   2.0
2  c  1.0   2.0
3  d   a1   a2
4  e   b2   b1
5  f   c3   c4
2

There are 2 best solutions below

0
On BEST ANSWER

you can fillna after calculating for the full dataframe and set_axis

out = (df.fillna(df.apply(calculate, axis=1, result_type='expand')
                       .set_axis(['y','z'],inplace=False,axis=1)))

print(out)

   x   y   z
0  a   1   2
1  b   1   2
2  c   1   2
3  d  a1  a2
4  e  b2  b1
5  f  c3  c4
0
On

Try:

df.loc[mask,["y","z"]] = pd.DataFrame(df.loc[mask].apply(calculate, axis=1).to_list(), index=df[mask].index, columns = ["y","z"])

print(df)

        x   y   z
    0   a   1   2
    1   b   1   2
    2   c   1   2
    3   d   a1  a2
    4   e   b2  b1
    5   f   c3  c4