How do I keep values based on dataframe values?

29 Views Asked by daily update At 26 September 2022 at 06:47

I have the following dataframe.

ID   path1   path2   path3
1    12      NaN      NaN
1    1       5        NaN
1    2       NaN       ''
1    2       4        111
2    123     NaN      NaN
3    11      25       NaN
3    1       NaN      NaN 
3    21      34       NaN
3    NaN     NaN      NaN

I want to update column values based on ID if there are NaN values. first priority is path1, path2 and path3 have values then keep. check which column has more results keep that column.

What is the best way to get the result of the dataframe below?

ID   path1   path2   path3
1    2       4       111
2    123     NaN     NaN  
3    11      25      NaN
3    21      34      NaN

Original Q&A

There are 1 best solutions below

ThePyGuy On 26 September 2022 at 07:01 BEST ANSWER

Group the dataframe by ID then get all the records which have maximum number of non NaN values in the rows by applying a function to the group.

>>> (df.groupby('ID')
    .apply(lambda x: x.loc[x.notna().sum(axis=1).max() == x.notna().sum(axis=1)])
    .reset_index(drop=True))

   ID  path1  path2 path3
0   1    2.0    4.0   111
1   2  123.0    NaN   NaN
2   3   11.0   25.0   NaN
3   3   21.0   34.0   NaN

Here is slightly modified version as above code creates mask and calls sum twice, it can be avoided by use of normal function than a lambda function:

def get_rows(df):
    counts = df.notna().sum(axis=1)
    return df.loc[counts==counts.max()]


df.groupby('ID').apply(get_rows).reset_index(drop=True)
   ID  path1  path2 path3
0   1    2.0    4.0   111
1   2  123.0    NaN   NaN
2   3   11.0   25.0   NaN
3   3   21.0   34.0   NaN

How do I keep values based on dataframe values?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in GROUP-BY

Related Questions in PANDAS-APPLY

Trending Questions

Popular # Hahtags

Popular Questions