i'm pretty confused. I want to speed up my algorithm by using mclapply:parallel, but when I compare time efficiency, apply still wins.
I'm smoothing log2ratio data by rq.fit.fnb:quantreg which is called by my function quantsm and I'm wrapping my data into matrix/list for apply/lapply(mclapply) usage.
I adjist my data like this:
q = matrix(data, ncol=N) # wrapping into matrix (using N = 2, 4, 6 or 8)
ql = as.list(as.data.frame(q)) # making list
And time comparing:
apply=system.time(apply(q, 1, FUN=quantsm, 0.50, 2))
lapply=system.time(lapply(ql, FUN=quantsm, 0.50, 2))
mc2lapply=system.time(mclapply(ql, FUN=quantsm, 0.50, 2, mc.cores=2))
mc4lapply=system.time(mclapply(ql, FUN=quantsm, 0.50, 2, mc.cores=4))
mc6lapply=system.time(mclapply(ql, FUN=quantsm, 0.50, 2, mc.cores=6))
mc8lapply=system.time(mclapply(ql, FUN=quantsm, 0.50, 2, mc.cores=8))
timing=rbind(apply,lapply,mc2lapply,mc4lapply,mc6lapply,mc8lapply)
Function quantsm:
quantsm <- function (y, p = 0.5, lambda) {
# Quantile smoothing
# Input: response y, quantile level p (0<p<1), smoothing parmeter lambda
# Result: quantile curve
# Augment the data for the difference penalty
m <- length(y)
E <- diag(m);
Dmat <- diff(E);
X <- rbind(E, lambda * Dmat)
u <- c(y, rep(0, m - 1))
# Call quantile regression
q <- rq.fit.fnb(X, u, tau = p)
q
}
Function rq.fit.fnb (quantreg library):
rq.fit.fnb <- function (x, y, tau = 0.5, beta = 0.99995, eps = 1e-06)
{
n <- length(y)
p <- ncol(x)
if (n != nrow(x))
stop("x and y don't match n")
if (tau < eps || tau > 1 - eps)
stop("No parametric Frisch-Newton method. Set tau in (0,1)")
rhs <- (1 - tau) * apply(x, 2, sum)
d <- rep(1, n)
u <- rep(1, n)
wn <- rep(0, 10 * n)
wn[1:n] <- (1 - tau)
z <- .Fortran("rqfnb", as.integer(n), as.integer(p), a = as.double(t(as.matrix(x))),
c = as.double(-y), rhs = as.double(rhs), d = as.double(d),
as.double(u), beta = as.double(beta), eps = as.double(eps),
wn = as.double(wn), wp = double((p + 3) * p), it.count = integer(3),
info = integer(1), PACKAGE = "quantreg")
coefficients <- -z$wp[1:p]
names(coefficients) <- dimnames(x)[[2]]
residuals <- y - x %*% coefficients
list(coefficients = coefficients, tau = tau, residuals = residuals)
}
For data vector of length 2000 i get:
(value = elapsed time in sec; columns = different number of columns of smoothed matrix/list)
2cols 4cols 6cols 8cols
apply 0.178 0.096 0.069 0.056
lapply 16.555 4.299 1.785 0.972
mc2lapply 11.192 2.089 0.927 0.545
mc4lapply 10.649 1.326 0.694 0.396
mc6lapply 11.271 1.384 0.528 0.320
mc8lapply 10.133 1.390 0.560 0.260
For data of length 4000 i get:
2cols 4cols 6cols 8cols
apply 0.351 0.187 0.137 0.110
lapply 189.339 32.654 14.544 8.674
mc2lapply 186.047 20.791 7.261 4.231
mc4lapply 185.382 30.286 5.767 2.397
mc6lapply 184.048 30.170 8.059 2.865
mc8lapply 182.611 37.617 7.408 2.842
Why is apply so much more efficient than mclapply? Maybe I'm just doing some usual beginner mistake.
Thank you for your reactions.
It looks like
mclapply
compares pretty well againstlapply
, butlapply
does not compare well againstapply
. The reason may be that you're iterating over the rows ofq
withapply
, and you're iterating over the columns ofq
withlapply
andmclapply
. That may account for the performance difference.If you really do want to iterate over the rows of
q
, you could createql
using:If you want to iterate over the columns of
q
, then you should setMARGIN=2
inapply
, as suggested by @flodel.Both
lapply
andmclapply
will iterate over the columns of a data frame, so you can createql
with:This makes sense since a data frame actually is a list.