How do I replace all NA with mean in R?

11k Views Asked by At

I have over 1500 columns in my dataset and 100+ of them contains at least one NA. I know I can replace NAs in a single column by

d$var[is.na(d$var)] <- mean(d$var, na.rm=TRUE)

but how do I do this too ALL the NAs in my dataset?

Thank you!

1

There are 1 best solutions below

0
On BEST ANSWER

We can use na.aggregate from zoo. Loop through the columns of dataset (assuming all the columns are numeric ), apply the na.aggregate to replace the NA with mean values (by default) and assign it back to the dataset.

library(zoo)
df[] <- lapply(df, na.aggregate)

By default, the FUN argument of na.aggregate is mean:

Default S3 method:

na.aggregate(object, by = 1, ..., FUN = mean, na.rm = FALSE, maxgap = Inf)

To do this nondestructively:

df2 <- df
df2[] <- lapply(df2, na.aggregate)

or in one line:

df2 <- replace(df, TRUE, lapply(df, na.aggregate))

If there are non-numeric columns, do this only for the numeric columns by creating a logical index first

ok <- sapply(df, is.numeric)
df[ok] <- lapply(df[ok], na.aggregate)