Extending Rcpp::as for custom classes depending on Rcpp.h

199 Views Asked by At

I'm working on an Rcpp sparse matrix class that uses both Rcpp::IntegerVector (row/column pointers) and a templated std::vector<T>. The rationale is that overhead in deep-copying the integer pointer vectors (@i, @p) in extremely large sparse-matrices can be avoided by simply leaving them as pointers to R objects, and consistently, microbenchmarks show that this approach takes almost exactly half the time as conversion to Eigen::SparseMatrix<T> and arma::SpMat<T> while using less memory.

Bare-bones Rcpp sparse matrix class

namespace SpRcpp {

    template<typename T>
    class SpMatrix {

    public:
        Rcpp::IntegerVector i, p;
        std::vector<T> x;
        unsigned int n_row, n_col;

        // constructor for the class from an Rcpp::S4 object
        SpMatrix(Rcpp::S4& mat) {
            Rcpp::IntegerVector dims = mat.slot("Dim");
            n_row = (unsigned int)dims[0];
            i = mat.slot("i");
            p = mat.slot("p");
            x = Rcpp::as<std::vector<T>>(mat.slot("x"));
        };
        // other constructors, class methods, iterators, etc.
    };
}

Example usage:

//[[Rcpp::export]]
std::vector<float> SpRcpp_SpMatrix(Rcpp::S4& mat) {
    SpRcpp::SpMatrix<float> A(mat);
    return A.x;
}

This works!

However, I'd like to have Rcpp implicitly convert an S4 dgCMatrix object, for example, to an SpRcpp::SpMatrix object to enable functions like the following:

Implicit Rcpp wrapping

//[[Rcpp::export]]
std::vector<float> SpRcpp_SpMatrix2(SpRcpp::SpMatrix<float>& mat) {
    return mat.x;
}

Such is a case of using Rcpp::as.

I've tried the following:

namespace Rcpp {
    namespace traits {
        template <typename T>
        class Exporter< SpRcpp::SpMatrix<T> > {
        public:
            Exporter(SEXP x) { Rcpp::S4 mat = x; }
            SpRcpp::SpMatrix<T> get() {
                return SpRcpp::SpMatrix<T>(mat);
            }
        private: Rcpp::S4 mat;
        };
    }
}

This compiles, and I know that the SpRcpp::SpMatrix<T>(Rcpp::S4& x) constructor works, but when I attempt to feed a dgCMatrix into SpRcpp_SpMatrix() I get the error:

Error in SpRcpp_SpMatrix2(A) : Not an S4 object.

I presume this is because I've declared the following before all my class declaration:

#include <RcppCommon.h>
#include <Rcpp.h>

Per the docs here, the RcppGallery example, and the RcppArmadillo implementation, #include <Rcpp.h> must not precede the Rcpp::as and Rcpp::wrap functions, but in my case I can't do that because my class definition requires Rcpp.h.

QUESTION: How do I create an Rcpp::as for SpRcpp::SpMatrix from an Rcpp::S4 dgCMatrix when the SpMatrix class depends on Rcpp.h?

1

There are 1 best solutions below

0
On BEST ANSWER

It's actually quite simple to create an Rcpp SparseMatrix class! I was overthinking it.

#include <rcpp.h>

// Rcpp for sparse matrices (spRcpp)
namespace Rcpp {
    class SparseMatrix {
    public:
        Rcpp::IntegerVector i, p;
        Rcpp::NumericVector x;
        int n_rows, n_cols;

        // constructor
        SparseMatrix(Rcpp::S4 mat) {
            Rcpp::IntegerVector dim = mat.slot("Dim");
            i = mat.slot("i");
            p = mat.slot("p");
            x = mat.slot("x");
            n_rows = (int)dim[0];
            n_cols = (int)dim[1];
        };
    };
}

namespace Rcpp {
    template <> Rcpp::SparseMatrix as(SEXP mat) {
        return Rcpp::SparseMatrix(mat);
    }
}

//[[Rcpp::export]]
Rcpp::NumericVector toRcppSparseMatrix(Rcpp::SparseMatrix& A) {
    return A.x;
}

Given a Matrix::dgCMatrix, mat, calling toRcppSparseMatrix(mat) returns non-zero values in 1-2 microseconds for 25 million values. This is in contrast to RcppArmadillo or RcppEigen sparse matrix conversions which take about 250 milliseconds for this same matrix and runs a deep copy in-memory.

As Dirk suggested, using RcppArmadillo ivec and dvec were very efficient, but still created shallow copies which resulted in ~100 milliseconds runtime and consumed some memory.

Obviously, the above approach is restricted to double types, so float operations are not possible without a deep copy.