I want to combine the permuted col1 and col2 values into one row. containing only the first combination & summing the count column of both. Is there an easy way to do this in pandas?

The example data frame and output: For example, in the below data frame I want to combine rows with values A, B, and B, A and sum their count column. Same for rows with values C, D, and D, C and sum their count values as well. I want to keep the rest of the rows in the data frame as is.

INPUT:

col1 col2 count
A B 3
C D 2
B A 5
E F 2
G H 8
D C 5
I J 4

OUTPUT:

col1 col2 count
A B 8
C D 7
E F 2
G H 8
I J 4
2

There are 2 best solutions below

0
On BEST ANSWER

You can .groupby according sorted col1/col2:

x = (
    df.groupby(df[["col1", "col2"]].apply(lambda x: tuple(sorted(x)), 1))
    .agg({"col1": "first", "col2": "first", "count": "sum"})
    .reset_index(drop=True)
)
print(x)

Prints:

  col1 col2  count
0    A    B      8
1    C    D      7
2    E    F      2
3    G    H      8
4    I    J      4
0
On

We can np.sort across rows to ensure the same values appear in the correct columns (for example A B and B A both become A B), then groupby sum on the now ordered columns:

# Sort Across Rows
df[['col1', 'col2']] = np.sort(df[['col1', 'col2']], axis=1)
# Accumulate counts by col1 and col2 (now in same columns)
df = df.groupby(['col1', 'col2'], as_index=False)['count'].sum()

df:

  col1 col2  count
0    A    B      8
1    C    D      7
2    E    F      2
3    G    H      8
4    I    J      4

Setup (DataFrame and imports):

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'col1': ['A', 'C', 'B', 'E', 'G', 'D', 'I'],
    'col2': ['B', 'D', 'A', 'F', 'H', 'C', 'J'],
    'count': [3, 2, 5, 2, 8, 5, 4]
})