How do I use .fillna to fill based on a list of mean values that are grouped by another column?

50 Views Asked by At

I have a dataframe named df that has null values for Bandwidth_GB_Year.

I am trying to fill the null values based on the means that are grouped by another column, InternetService.

When I use fillna, it either doesn't fill the null values when i don't include inplace argument, or it removes all of the values from the column when I use inplace = true.

How do I fill the null values based on its matching internet service type?

In:

mean_values = df.groupby("InternetService")["Bandwidth_GB_Year"].mean().round(3)
print(mean_values)

Out:

InternetService
DSL            3717.539
Fiber Optic    3235.343
None           3224.141
Name: Bandwidth_GB_Year, dtype: float64

In:

df["Bandwidth_GB_Year"] = df["Bandwidth_GB_Year"].fillna(mean_values, inplace = True)
print(df["Bandwidth_GB_Year"])

Out:

1        None
2        None
3        None
4        None
5        None
         ... 
9996     None
9997     None
9998     None
9999     None
10000    None
Name: Bandwidth_GB_Year, Length: 10000, dtype: object
1

There are 1 best solutions below

0
On

The simplified example below shows how to replace NaN values with group mean values:

import pandas as pd

df = pd.DataFrame({'col1': [1,2,1,2,1,1,2],
                   'col2': [5, 4, None, 3, None, 7, 9]
                   })


df['col2'] = df['col2'].fillna(df.groupby('col1')['col2'].transform('mean'))

print(df)

gives

   col1  col2
0     1   5.0
1     2   4.0
2     1   6.0
3     2   3.0
4     1   6.0
5     1   7.0
6     2   9.0