How can I merge duplicate DataFrame columns and also keep all original column names?
e.g. If I have the DataFrame
df = pd.DataFrame({"col1" : [0, 0, 1, 2, 5, 3, 7],
"col2" : [0, 1, 2, 3, 3, 3, 4],
"col3" : [0, 1, 2, 3, 3, 3, 4]})
I can remove the duplicate columns (yes the transpose is slow for large DataFrames) with
df.T.drop_duplicates().T
but this only preserves one column name per unique column
col1 col2
0 0 0
1 0 1
2 1 2
3 2 3
4 5 3
5 3 3
6 7 4
How can I keep the information on which columns were merged? e.g. something like
[col1] [col2, col3]
0 0 0
1 0 1
2 1 2
3 2 3
4 5 3
5 3 3
6 7 4
Thanks!
I also used
T
andtuple
togroupby