I have the following data:
Name | Age | Country | Occupation | Hobby | |
---|---|---|---|---|---|
0 | A | 23 | DE | Job holder | Fishing |
1 | A | 23 | DE | Job holder | Gardening |
2 | A | 23 | DE | Job holder | Fishing |
3 | A | 23 | DE | Job holder | Reading |
4 | B | 15 | SW | Job holder | Fishing |
5 | B | 15 | SW | Job holder | Playing |
6 | C | 23 | DD | Job holder | Coding |
7 | B | 23 | AA | Job holder | Fishing |
8 | D | 34 | GH | Job holder | Fishing |
9 | D | 33 | TR | Other | Fishing |
When there is a duplicate in the "Name" columns, I would like to compare them and I would like to know the reason for their uniqueness. I want to add one new column which column should contain the responsible column header names for this uniqueness. If a row is found one time then write 'Unique' in the reason column.
The output should look like this:
Name | Age | Country | Occupation | Hobby | Reason | |
---|---|---|---|---|---|---|
0 | A | 23 | DE | Job holder | Fishing | Occupation, Hobby |
1 | A | 23 | DE | Job holder | Gardening | Occupation, Hobby |
2 | A | 23 | DE | Student | Fishing | Occupation, Hobby |
3 | A | 23 | DE | Job holder | Reading | Occupation, Hobby |
4 | B | 15 | SW | Job holder | Fishing | Hobby |
5 | B | 15 | SW | Job holder | Playing | Hobby |
6 | C | 23 | DD | Job holder | Coding | Unique |
7 | B | 23 | AA | Job holder | Fishing | Age, Country |
8 | D | 34 | GH | Job holder | Fishing | Age, Country, Occupation |
9 | D | 33 | TR | Other | Fishing | Age, Occupation, Occupation |
I trying this way:
dif = [i for i, (x,y) in enumerate(zip(df.loc[0].values, df.loc[9,:].values)) if x!=y ] df.iloc[:, dif]
but it's not giving the way output should look like. I am new in python, any help would be appreciated!
You can identify the non-unique reasons, then merge as a string.
There are multiple ways to do this.
Using
groupby.nunique
and dot product:With
groupby.nunique
andmelt
:With
groupby.apply
:Output: