Fast cosine similarity of two sparse matrices in Rcpp with Armadillo

168 Views Asked by At

I'm trying to port a very fast R function for calculating cosine similarity into Rcpp with Armadillo and sparse matrix operations.

Here's the R function:

#' Compute cosine similarities between columns in x and y
#' 
#' @description adapted from qlcMatrix::cosSparse
#'
#' @param x dgCMatrix with samples as columns
#' @param y dgCMatrix with samples as columns
#' @return dgCMatrix of cosine similarities pairs of columns in "x" and "y"
sparse.cos <- function(x, y) {
  s <- rep(1, nrow(x))
  nx <- Matrix::Diagonal(x = drop(Matrix::crossprod(x ^ 2, s)) ^ -0.5)
  x <- x %*% nx
  ny <- Matrix::Diagonal(x = drop(Matrix::crossprod(y ^ 2, s)) ^ -0.5)
  y <- y %*% ny
  return(Matrix::crossprod(x, y))
}

Here's an example usage of the R function:

library(Matrix)
m1 <- rsparsematrix(1000, 10000, density = 0.1)
m2 <- rsparsematrix(1000, 100, density = 0.2)
res <- c_sparse_cos_mat_mat(m1, m2)

And here's my best stab so far at an Rcpp function (not working):

//[[Rcpp::export]]
arma::SpMat<double> sparse_cos(const arma::SpMat<double> &x, const arma::SpMat<double> &y){

    arma::vec s(x.n_rows);
    s = s.fill(1);
    arma::vec nx = arma::vec(1 / sqrt(square(x) * s));
    arma::vec ny = arma::vec(1 / sqrt(square(y) * s));
    
    // apply column-wise Euclidean norm to x and y
    for(sp_mat::const_iterator it_x = x.begin(); it_x != x.end(); it_x++)
      x.at(it_x.row(), it_x.col()) = *it_x * nx(it_x.col());

    for(sp_mat::const_iterator it_y = y.begin(); it_y != y.end(); it_y++)
      y.at(it_y.row(), it_y.col()) = *it_y * ny(it_y.col());
    
    // return cross-product of x and y as cosine distance    
    return(x * y);
}

Questions:

  1. What is the fastest way to multiply all non-zero values in each column of SpMat x by corresponding values in a vector of length ncol(x)?

  2. How do I fix the issues in the Rcpp function? Specifically: lvalue required as left operand of assignment in line x.at(it_x.row(), it_x.col()) = *it_x * nx(it_x.col());.

  3. The result is inherently dense, and ideally would be returned as a dense matrix. Is there a fast method for taking the cross-product of two sparse matrices while explicitly filling in a dense matrix with the result?

0

There are 0 best solutions below