The computation time for the following function is very high. Is there any room for improvement? Should I be accessing the elements of matrix X
differently? I appreciate any comments or suggestions.
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat myfunc(const int& n,
const int& p,
arma::mat& X,
arma::rowvec& y,
const arma::rowvec& types,
const arma::mat& rat,
const arma::rowvec& betas){
arma::mat final(p+p,n);
final.zeros();
int i,j;
for(i=0; i < n; ++i){
arma::colvec finalfirst(p+p); finalfirst.zeros();
for(j=0; j < n; ++j){
arma::mat Xt = X * log(y(j));
arma::mat finalX = join_rows(X,Xt);
arma::rowvec Xi = finalX.row(i);
if(types(i)==1 && y(j)==y(i)){
finalfirst += (Xi.t() - rat.col(j));
}
if(types(i)>1 && y(j) > y(i)){
finalfirst -= (Xi.t() - rat.col(j)) * exp(arma::as_scalar(betas*Xi.t()));
}
else if(y(j) <= y(i)){
finalfirst -= Xi.t() * exp(arma::as_scalar(betas*Xi.t()));
}
}
final.col(i) = finalfirst;
}
return(final);
}
/*** R
m=4000
types = runif(m,0,5)
types[types<=1] = 0 ; types[types > 1 & types < 3] = 1; types[types>=3]=2
microbenchmark(out = myfunc(n=m,p=2,X=matrix(rnorm(m*2),nrow=m,ncol=2),y=runif(m,0,3),types=types,rat=matrix(rnorm(m*4),nrow=4,ncol=m),betas=c(1,2,3,4)))
*/
It might be helpful if you provide at least some explanation of what it is you are doing.
First, I recommend breaking up the code into very small chunks, microbenchmarking each chunk, and finding bottleneck operations. This is better than throwing the whole function at the wall all at once.
row vs. column access
join_rows(X,Xt)
This is a very slow operation, especially because you are adding a
rowvec
to a column-orientedmat
. Armadillo matrices are stored as a vector in column-major order, so behind the scenes Armadillo is callingpush_back
in n non-contiguous locations, where n is the number of columns.It seems you may be able to avoid this altogether, as
finalX.row(i)
is the only call that depends onfinalX
in this loop. Just figure out what.row(i)
is.implicit operations
You do a lot of transposing, maybe you should be working with a transposed matrix from the get-go?
Xi.t()
is called twice in the same line:finalfirst -= (Xi.t() - rat.col(j)) * exp(arma::as_scalar(betas*Xi.t()));
Transposing a row vector is pretty pointless, just initialize it as a column vector and iterate through it with a good old-fashioned loop. A little more code isn't always a bad thing, it often makes your intent more explicit too.
Copying vs. in-place operations
This is a copy:
final.col(i) = finalfirst;
Why not operate in-memory and update
final(_, i)
in-place rather than using a temporaryfinalfirst
followed by a deep copy into your target matrix? This is a column, so memory is contiguous and the compiler will be able to optimize the access pattern for you as wonderfully as if you were working on any simple vector.All said, I haven't fully wrapped my head around what it is exactly that you are doing, but @largest_prime_is_463035818 may be right about switching the
for(i...
) andfor(j...)
loops around. It appears you will be able to pull these two rows out of the nested loop:since they do not depend on
i
at all.