Writing a C++ iterator for a sparse matrix class

234 Views Asked by At

I'm attempting to get a basic constant forward-iterator to work in C++.

namespace Rcpp {
    class SparseMatrix {
    public:
        IntegerVector i, p;
        NumericVector x;
   
        int begin_col(int j) { return p[j]; };
        int end_col(int j) { return p[j + 1]; };
        
        class iterator {
        public:
            int index;
            iterator(SparseMatrix& g) : parent(g) {}
            iterator(int ind) { index = ind; };                       // ERROR!
            bool operator!=(int x) const { return index != x; };
            iterator operator++(int) { ++index; return (*this); };
            int row() { return parent.i[index]; };
            double value() { return parent.x[index]; };
        private:
            SparseMatrix& parent;
        };
    };    
}

My intention is to use the iterator in contexts similar to the following:

// sum of values in column 7
Rcpp::SparseMatrix A(nrow, ncol, fill::random);
double sum = 0;
for(Rcpp::SparseMatrix::iterator it = A.begin_col(7); it != A.end_col(7); it++)
    sum += it.value();

Two questions:

  1. The compiler throws an error on the line indicated above: uninitialized reference member in 'class Rcpp::SparseMatrix&' [-fpermissive]. How can this be fixed?
  2. How might double value() { return parent.x[index]; }; be re-worked to return a pointer to the value rather than a copy of the value?

A little context on the SparseMatrix class: like a dgCMatrix in R, this object of class SparseMatrix consists of three vectors:

  • i holds row pointers for every element in x
  • p gives indices in i which correspond to the start of each column
  • x contains non-zero values
1

There are 1 best solutions below

1
On BEST ANSWER

Thanks to @Evg, here's the solution:

namespace Rcpp {
    class SparseMatrix {
    public:
        IntegerVector i, p;
        NumericVector x;
   
        class iterator {
        public:
            int index;
            iterator(SparseMatrix& g, int ind) : parent(g) { index = ind; }
            bool operator!=(iterator x) const { return index != x.index; };
            iterator& operator++() { ++index; return (*this); };
            int row() { return parent.i[index]; };
            double& value() { return parent.x[index]; };
        private:
            SparseMatrix& parent;
        };

        iterator begin_col(int j) { return iterator(*this, p[j]); };
        iterator end_col(int j) { return iterator(*this, p[j + 1]); };
    };    
}

And it can be used as follows, for instance, to calculate colSums:

//[[Rcpp::export]]
Rcpp::NumericVector Rcpp_colSums(Rcpp::SparseMatrix& A) {
    Rcpp::NumericVector sums(A.cols());
    for (int i = 0; i < A.cols(); ++i)
        for (Rcpp::SparseMatrix::iterator it = A.begin_col(i); it != A.end_col(i); it++)
            sums(i) += it.value();
    return sums;
}

And, the above function is faster than RcppArmadillo, RcppEigen, and R::Matrix equivalents when microbenchmarked from R!

Edit:

The above syntax is inspired by Armadillo. I've come to realize that a slightly different syntax (which involves fewer constructions) gives an iterator similar to Eigen:

class col_iterator {
    public:
      col_iterator(SparseMatrix& ptr, int col) : ptr(ptr) { indx = ptr.p[col]; max_index = ptr.p[col + 1]; }
      operator bool() const { return (indx != max_index); }
      col_iterator& operator++() { ++indx; return *this; }
      const double& value() const { return ptr.x[indx]; }
      int row() const { return ptr.i[indx]; }
    private:
      SparseMatrix& ptr;
      int indx, max_index;
    };

Which can then be used like this:

int col = 0;
for (Rcpp::SparseMatrix::col_iterator it(A, col); it; ++it)
     Rprintf("row: %3d, value: %10.2e", it.row(), it.value());