RCPP and the %*% operator, revisited

115 Views Asked by At

I'm trying to decide if it makes sense to implement R's %*% operator in RCpp if my dataset is huge. BUT, I am really having trouble getting a RCpp implementation.

Here is my example R code

# remove everything in the global environment
rm(list = ls())                

n_states = 4                   # number of states
v_n <- c("H", "S1", "S2", "D") # the 4 states of the model:
n_t = 100                      # number of transitions

# create transition matrix with random numbers. This transition matrix is constant.
m_P = matrix(runif(n_states*n_t), # insert n_states * n_t random numbers (can change this later)
         nrow = n_states, 
         ncol = n_states,
         dimnames = list(v_n, v_n))

# create markov trace, what proportion of population in each state at each period (won't make sense due to random numbers but that is fine)
m_TR <- matrix(NA, 
               nrow = n_t + 1 , 
               ncol = n_states, 
               dimnames = list(0:n_t, v_n))     # create Markov trace (n_t + 1 because R doesn't understand  Cycle 0)

# initialize Markov trace 
m_TR[1, ] <- c(1, 0, 0, 0)                      


# run the loop

microbenchmark::microbenchmark(   # function from microbenchmark library used to calculate how long this takes to run

for (t in 1:n_t){                              # throughout the number of cycles
  m_TR[t + 1, ] <- m_TR[t, ] %*% m_P           # estimate the Markov trace for cycle the next cycle (t + 1)
}


) # end of micro-benchmark function

print(m_TR)   # print the result.

And, here is the replacement for the %*% operator: (WHich doesn't seem to work correctly at all, although I can't fgure out why.

    library(Rcpp)


cppFunction(
    'void estimate_markov(int n_t, NumericMatrix m_P, NumericMatrix m_TR )
     {
            // We want to reproduce this
            //   matrix_A[X+1,]  <- matrix_A[X,] %*% matrix_B
            // The %*% operation behaves as follows for a vector_V %*% matrix_M

            // Here the Matrix M is populated with letters so that you can
            // more easily see how the operation is performed
            // So, a multiplication like this:
            //
            //  V          M
            // {1}  %*%  {A D}
            // {2}       {B E}
            // {3}       {C F}
            //
            // Results in a vector:
            //   V_result
            // {1*A + 1*D}
            // {2*B + 2*E}
            // {3*C + 3*F}
            //
            // Now use values instead of letters for M (ex: A=1, B=2, .., F=6)
            //  V_result
            // {1*1 + 1*4}    {1 + 4}     {5}
            // {2*2 + 2*5} => {4 + 10} => {14}
            // {3*3 + 3*6}    {9 + 18}    {27}
            //
            // Note that the RHS matrix may contain any number of columns,
            // but *MUST* must contain the same number of rows as LHS vector

        // Get dimensions of matricies , and sanity check
        // number of elements in a vector from the LHS matrix must equal == number of row on the RHS
        if( m_TR.cols() != m_P.rows())
            throw std::range_error("Matrix mismatch, m_P.rows != m_TR.rows");

        // we want to know these dimensions, and there is no reason to call these functons in a loop
        // store the values once
        int cnt_P_cols = m_P.cols();
        int cnt_TR_cols = m_TR.cols();

        //
        for(int Index = 1; Index <= n_t; ++Index)
        {
            // iterate over the columns in m_TR
            for(int col_iter = 0; col_iter < cnt_TR_cols; ++col_iter)
            {
                // an accumulator for the vector multiplication
                double sum = 0;

                // The new value comes from the previous row (Index-1)
                double orig_TR = m_TR(col_iter, Index-1);

                // iterate over the columns in m_P corresponding to this Index
                for(int p_iter = 0; p_iter < cnt_P_cols; ++p_iter)
                {
                    // accumulate the value of this TR scalar * the m_P vector
                    sum += orig_TR * m_P(p_iter, Index);
                }
                m_TR(col_iter, Index) = sum;
            }
          }
        }'
    )

Can someone point me to where my logic is going wrong.

0

There are 0 best solutions below