Pandas group by one column and fill up another column

1.7k Views Asked by gblm At 14 July 2022 at 14:56

I have the following dataframe with two columns:

data = [['A', '3ykf'], ['A', '3ykf'], ['A', ], ['B', ], ['B', '6jbk'], ['B', ], ['B', ], ['C', ], ['C', ]]
df = pd.DataFrame(data, columns=['column1', 'column2'])

column1 | column2
   A       "3ykf"
   A          
   A       "3ykf"
   B
   B
   B       "6jbk"
   B
   C
   C

I want to "fill up" the second column like this:

column1 | column2
   A       "3ykf"
   A       "3ykf"
   A       "3ykf"
   B       "6jbk"
   B       "6jbk"
   B       "6jbk"
   B       "6jbk"
   C
   C

Column1 is the column I want to group by, and within each group, column2 contains either a string that doesn't change within one group or it's empty.

I want to fill up the empty cells in column2 by adding the same string to each cell within one group.

The issue is that the string is not necessarily the most common value within one group, as the most common value might be an empty cell (like for group B).

Obviously, for group C, column2 can't be filled up with anything, so it should be kept empty in the resulting dataframe.

Original Q&A

There are 2 best solutions below

Stef On 14 July 2022 at 15:01

You can fill with the first avaiable value:

df.column2 = df.groupby('column1').column2.transform('first')

Result:

  column1 column2
0       A    3ykf
1       A    3ykf
2       A    3ykf
3       B    6jbk
4       B    6jbk
5       B    6jbk
6       B    6jbk
7       C    None
8       C    None

ko3 On 14 July 2022 at 15:14

Although @Stef's solution is probably the best, I would like to provide an additional one that uses pd.merge instead of groupby + transform. It is always good to think of several approaches. What it does is to join the unique sorted values of df to the original data frame:

(
  df.assign(column2=pd.merge(
        df, df.sort_values(["column2"]).drop_duplicates(subset=["column1"]),
        on=["column1"],
        how="left")[["column2_x", "column2_y"]]
   .bfill(axis=1).iloc[:,0]
   )
)

Output:

  column1 column2
0       A    3ykf
1       A    3ykf
2       A    3ykf
3       B    6jbk
4       B    6jbk
5       B    6jbk
6       B    6jbk
7       C    None
8       C    None

Pandas group by one column and fill up another column

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in PANDAS-GROUPBY

Trending Questions

Popular # Hahtags

Popular Questions