How to filter a pandas dataframe by unique column values

48 Views Asked by At

I have a pandas data frame with emails and I want to extract only the unique emails per row. I tried the code below but it does not work. It returns no change to the original data frame. Here is the original data frame: original data frame Here is the wanted data frame: enter image description here

df = pd.DataFrame({'z':[1,2,3,4],'a':['[email protected]','[email protected]','[email protected]','[email protected]'], 'b':['[email protected]','[email protected]','[email protected]','[email protected]'],'c':['[email protected]','[email protected]','[email protected]','[email protected]']})
df.to_csv('../output/try.csv', index=False)

df = pd.read_csv('../output/try.csv')
df2 = df.drop_duplicates(subset=['a', 'b', 'c'])
df2.to_csv('../output/try2.csv', index=False)

I've seen solutions that work with numbers in the columns but I have strings and for some reason it does not work with email strings. I tried the following code but it does nothing. df2 = df.drop_duplicates(subset=['a', 'b', 'c'])

1

There are 1 best solutions below

2
Shubham Sharma On BEST ANSWER

DataFrame.drop_duplicates will check for duplicate rows in the subset along the index axis but here you need to check for duplicates along each row so you have to apply this function on each row along column axis.

cols = ['a', 'b', 'c']
df[cols] = df[cols].apply(pd.Series.drop_duplicates, axis=1)

   z                 a                 b                 c
0  1  [email protected]  [email protected]  [email protected]
1  2     [email protected]               NaN               NaN
2  3      [email protected]               NaN               NaN
3  4  [email protected]  [email protected]               NaN