More efficient ways of applying a (possibly non-vectrizable) function on rows of a dataframe

76 Views Asked by At

I am trying to use Tukey's test for calculating the average of each row of a data frame, excluding the outliers.

df <- data.frame(matrix(rnorm(1000000), ncol = 10))
averaging_wo_outliers <- function(x){
    q_result = quantile(x, probs = c(0.25, 0.75), na.rm=TRUE)
    lowerq = q_result[1]
    upperq = q_result[2]
    iqr = upperq - lowerq
    threshold_upper = (iqr * 1.5) + upperq
    threshold_lower = lowerq - (iqr * 1.5)
    return(mean(x[(x <= threshold_upper) & (
        x >= threshold_lower)]))
}
result <- apply(df, 1, averaging_wo_outliers)

Now this is pretty slow. Taking a similar approach to this answer I have been trying to make this faster with vectorizing. Is it even possible to make this task faster? Also, if it is not vectorizable (if that is a word!), do you think using dplyr and data.table might help or I shouldn't expect any improvement using those packages? Thanks.

0

There are 0 best solutions below