How to use fillna in a for loop (Python)

116 Views Asked by At

As the title says, I am trying to use fillna in a for loop but it keeps on coming up with A value is trying to be set on a copy of a slice from a DataFrame error.

I want it so that each column that has a NaN value is replace by the mode for that column.

for title in object_null_columns:
    value = dataset.loc[:,title].mode()
    dataset.loc[:,title].fillna(value,inplace = True)

Above is what my current attempt is. Don't know if it makes sense but I'm really stuck. It does work but due to the code changing the values on a copy I can't use seaborn boxplot to find outliers

2

There are 2 best solutions below

0
On

it appears as though you are using a pandas dataframe.

generally speaking, it is poor practice (red flag) using for loops, which is less efficient in usage than the built in methods.

In particular, a method like fillna might be used like this:

import pandas as pd

df = pd.DataFrame({'A':[1, np.nan, 3], 'B':[5, 2, np.nan]})

# Fill NaN with 0
df.fillna(0)

# Fill NaN with mean of column 
df.fillna(df.mean())

# Fill NaN with forward fill
df.fillna(method='ffill')

# Fill with different values per column  
df.fillna({'A': 0, 'B': 5})

The key advantage is that the (fillna) method operates on the entire dataframe or dataframe column without the need to iterate through all the rows that a for loop would do.

the docs here might also help or be a good starting point for you...

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html

There are quite a few helpful worked examples on this page too.

0
On

It is OK to use a loop is that does what you want; the Python loop here is only along the columns not through all the data:

import pandas as pd

df = pd.DataFrame({'x': [11, 12, None, 14, 12, 12],
                   'y': [1, None, 3, 5, 5, 6],
                   'z': [21, 22, 23, 24, None, 21]
                   })


for col in df.columns:
    val = df[col].mode()[0]
    df[col] = df[col].fillna(val, axis = 0)
    
print(df)

gives:

      x    y     z
0  11.0  1.0  21.0
1  12.0  5.0  22.0
2  12.0  3.0  23.0
3  14.0  5.0  24.0
4  12.0  5.0  21.0
5  12.0  6.0  21.0