Pandas merge duplicate DataFrame columns preserving column names

842 Views Asked by Matt At 03 June 2025 at 23:59

How can I merge duplicate DataFrame columns and also keep all original column names?

e.g. If I have the DataFrame

df = pd.DataFrame({"col1" : [0, 0, 1, 2, 5, 3, 7],
                   "col2" : [0, 1, 2, 3, 3, 3, 4],
                   "col3" : [0, 1, 2, 3, 3, 3, 4]})

I can remove the duplicate columns (yes the transpose is slow for large DataFrames) with

df.T.drop_duplicates().T

but this only preserves one column name per unique column

How can I keep the information on which columns were merged? e.g. something like

    [col1] [col2, col3]
 0     0         0
 1     0         1
 2     1         2
 3     2         3
 4     5         3
 5     3         3
 6     7         4

Thanks!

Original Q&A

There are 2 best solutions below

piRSquared On 26 December 2016 at 03:44

I also used T and tuple to groupby

def f(x):
    d = x.iloc[[0]]
    d.index = ['-'.join(x.index.tolist())]
    return d

df.T.groupby(df.apply(tuple), group_keys=False).apply(f).T

Psidom On 25 December 2016 at 20:29

# group columns by their values 
grouped_columns = df.groupby(list(df.values), axis=1).apply(lambda g: g.columns.tolist())  

# pick one column from each group of the columns
unique_df = df.loc[:, grouped_columns.str[0]]

# make a new column name for each group, don't think the list can work as a column name, you need to join them
unique_df.columns = grouped_columns.apply("-".join)

unique_df

Pandas merge duplicate DataFrame columns preserving column names

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DUPLICATES

Related Questions in DATA-SCIENCE

Trending Questions

Popular # Hahtags

Popular Questions