I am trying to use Tukey's test for calculating the average of each row of a data frame, excluding the outliers.
df <- data.frame(matrix(rnorm(1000000), ncol = 10))
averaging_wo_outliers <- function(x){
q_result = quantile(x, probs = c(0.25, 0.75), na.rm=TRUE)
lowerq = q_result[1]
upperq = q_result[2]
iqr = upperq - lowerq
threshold_upper = (iqr * 1.5) + upperq
threshold_lower = lowerq - (iqr * 1.5)
return(mean(x[(x <= threshold_upper) & (
x >= threshold_lower)]))
}
result <- apply(df, 1, averaging_wo_outliers)
Now this is pretty slow. Taking a similar approach to this answer I have been trying to make this faster with vectorizing. Is it even possible to make this task faster? Also, if it is not vectorizable (if that is a word!), do you think using dplyr
and data.table
might help or I shouldn't expect any improvement using those packages? Thanks.