I have this dataframe for which I want to fill in the nan value:
quant = ['satisfaction_level', 'last_evaluation', 'average_monthly_hours']
cat = ['number_projects', 'time_spend_company', 'work_accident', 'promotion_last_5_years', 'position', 'salary']
X_train[quant] = X_train[quant].fillna(X_train[quant].median())
X_train[cat] = X_train[cat].fillna(X_train[cat].mode())
X_test[quant] = X_test[quant].fillna(X_test[quant].median())
X_test[cat] = X_test[cat].fillna(X_test[cat].mode())
Even when I use .loc instead, it does not fill the NaN and if there are not two modes as the mode()[0] returns an error.
However this does not fill in the categorical nan values though the documentation states that the value parameter can be a dataframe as well: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html
Other methods such as doing it by column works.
Any idea ?
The reason is that
.mode()returns aDataFrame, so you need to use.mode().iloc[0]:This
.mode().iloc[0]method will not give an error if there are not multiple modes. Themode()function always returns aDataFrame, even if there is only one mode. The.iloc[0]simply selects the first mode from thisDataFrame. So, if there is only one mode,.mode().iloc[0]will return that mode. If there are multiple modes,.mode().iloc[0]will return the first one. When you use.mode()[0]on apandas.DataFrame, it can lead to issues if there are not multiple modes. This is because.mode()returns aDataFrame, and when you use[0], it tries to access the first column of thisDataFrame, not the first row.I tested this snipped with the following code: