There are multiple id's. Some of them duplicates. I want to show not nan values on other rows in a single rows based on id's. But there are other things to.. An example;
>>> df
id val_one one val_two two val_three three val_four four
11 0 NaN 1 in_progress 0 NaN 0 NaN
11 1 in_progress 0 NaN 0 NaN 0 NaN
22 0 NaN 0 NaN 0 NaN 1 completed
22 0 NaN 1 in_progress 0 canceled 0 NaN
22 1 completed 0 NaN 0 NaN 0 NaN
33 0 NaN 0 NaN 1 completed 0 NaN
33 0 canceled 0 NaN 0 NaN 0 NaN
This is the last step i can get.
First, for example 'val_one' and 'one' columns are related and other columns are also related to each other. Values 'in_progress' and 'completed' mean '1' for the "val_" columns. 'canceled' and NaN is 0.
Secondly, 'in_progress', 'completed' and 'canceled' is actually is a value as you can see.
If there is a '0' value in rows i want to change it with '1' if there is a '1' in related columns.
If there is 'canceled' in a row and if there is no '1' value in the associated rows, I want to write it instead of the NaN value in the actual output.
If there is 'in_progress' somewhere there won't be another 'completed' or 'canceled' for the same id. This goes for the other 2 values as well.
I need it to be like this;
id val_one one val_two two val_three three val_four four
11 1 in_progress 1 in_progress 0 NaN 0 NaN
22 1 completed 1 in_progress 0 NaN 1 completed
33 0 canceled 0 NaN 1 completed 0 NaN
I did new columns for '0' and '1' values; (I can rename later for correct ones)
df['val_onee']=df.groupby(['id'])['val_one'].transform(max) df['val_twoo']=df.groupby(['id'])['val_two'].transform(max) df['val_threee']=df.groupby(['id'])['val_three'].transform(max) df['val_fourr']=df.groupby(['id'])['val_four'].transform(max)
Then I did something like this but not worked;
df = df[~df.drop('one', axis=1).duplicated(keep=False) | pd.notna(df['one'])] df = df[~df.drop('two', axis=1).duplicated(keep=False) | pd.notna(df['two'])] df = df[~df.drop('three', axis=1).duplicated(keep=False) | pd.notna(df['three'])] df = df[~df.drop('four', axis=1).duplicated(keep=False) | pd.notna(df['four'])]
Thanks in advance for all the help guys
You can use
groupbyand a dict of aggregation functions:Output: