I'm trying to port a very fast R function for calculating cosine similarity into Rcpp with Armadillo and sparse matrix operations.
Here's the R function:
#' Compute cosine similarities between columns in x and y
#'
#' @description adapted from qlcMatrix::cosSparse
#'
#' @param x dgCMatrix with samples as columns
#' @param y dgCMatrix with samples as columns
#' @return dgCMatrix of cosine similarities pairs of columns in "x" and "y"
sparse.cos <- function(x, y) {
s <- rep(1, nrow(x))
nx <- Matrix::Diagonal(x = drop(Matrix::crossprod(x ^ 2, s)) ^ -0.5)
x <- x %*% nx
ny <- Matrix::Diagonal(x = drop(Matrix::crossprod(y ^ 2, s)) ^ -0.5)
y <- y %*% ny
return(Matrix::crossprod(x, y))
}
Here's an example usage of the R function:
library(Matrix)
m1 <- rsparsematrix(1000, 10000, density = 0.1)
m2 <- rsparsematrix(1000, 100, density = 0.2)
res <- c_sparse_cos_mat_mat(m1, m2)
And here's my best stab so far at an Rcpp function (not working):
//[[Rcpp::export]]
arma::SpMat<double> sparse_cos(const arma::SpMat<double> &x, const arma::SpMat<double> &y){
arma::vec s(x.n_rows);
s = s.fill(1);
arma::vec nx = arma::vec(1 / sqrt(square(x) * s));
arma::vec ny = arma::vec(1 / sqrt(square(y) * s));
// apply column-wise Euclidean norm to x and y
for(sp_mat::const_iterator it_x = x.begin(); it_x != x.end(); it_x++)
x.at(it_x.row(), it_x.col()) = *it_x * nx(it_x.col());
for(sp_mat::const_iterator it_y = y.begin(); it_y != y.end(); it_y++)
y.at(it_y.row(), it_y.col()) = *it_y * ny(it_y.col());
// return cross-product of x and y as cosine distance
return(x * y);
}
Questions:
What is the fastest way to multiply all non-zero values in each column of
SpMat x
by corresponding values in a vector of lengthncol(x)
?How do I fix the issues in the Rcpp function? Specifically:
lvalue required as left operand of assignment
in linex.at(it_x.row(), it_x.col()) = *it_x * nx(it_x.col());
.The result is inherently dense, and ideally would be returned as a dense matrix. Is there a fast method for taking the cross-product of two sparse matrices while explicitly filling in a dense matrix with the result?