I'm trying to decide if it makes sense to implement R's %*% operator in RCpp if my dataset is huge. BUT, I am really having trouble getting a RCpp implementation.
Here is my example R code
# remove everything in the global environment
rm(list = ls())
n_states = 4 # number of states
v_n <- c("H", "S1", "S2", "D") # the 4 states of the model:
n_t = 100 # number of transitions
# create transition matrix with random numbers. This transition matrix is constant.
m_P = matrix(runif(n_states*n_t), # insert n_states * n_t random numbers (can change this later)
nrow = n_states,
ncol = n_states,
dimnames = list(v_n, v_n))
# create markov trace, what proportion of population in each state at each period (won't make sense due to random numbers but that is fine)
m_TR <- matrix(NA,
nrow = n_t + 1 ,
ncol = n_states,
dimnames = list(0:n_t, v_n)) # create Markov trace (n_t + 1 because R doesn't understand Cycle 0)
# initialize Markov trace
m_TR[1, ] <- c(1, 0, 0, 0)
# run the loop
microbenchmark::microbenchmark( # function from microbenchmark library used to calculate how long this takes to run
for (t in 1:n_t){ # throughout the number of cycles
m_TR[t + 1, ] <- m_TR[t, ] %*% m_P # estimate the Markov trace for cycle the next cycle (t + 1)
}
) # end of micro-benchmark function
print(m_TR) # print the result.
And, here is the replacement for the %*% operator: (WHich doesn't seem to work correctly at all, although I can't fgure out why.
library(Rcpp)
cppFunction(
'void estimate_markov(int n_t, NumericMatrix m_P, NumericMatrix m_TR )
{
// We want to reproduce this
// matrix_A[X+1,] <- matrix_A[X,] %*% matrix_B
// The %*% operation behaves as follows for a vector_V %*% matrix_M
// Here the Matrix M is populated with letters so that you can
// more easily see how the operation is performed
// So, a multiplication like this:
//
// V M
// {1} %*% {A D}
// {2} {B E}
// {3} {C F}
//
// Results in a vector:
// V_result
// {1*A + 1*D}
// {2*B + 2*E}
// {3*C + 3*F}
//
// Now use values instead of letters for M (ex: A=1, B=2, .., F=6)
// V_result
// {1*1 + 1*4} {1 + 4} {5}
// {2*2 + 2*5} => {4 + 10} => {14}
// {3*3 + 3*6} {9 + 18} {27}
//
// Note that the RHS matrix may contain any number of columns,
// but *MUST* must contain the same number of rows as LHS vector
// Get dimensions of matricies , and sanity check
// number of elements in a vector from the LHS matrix must equal == number of row on the RHS
if( m_TR.cols() != m_P.rows())
throw std::range_error("Matrix mismatch, m_P.rows != m_TR.rows");
// we want to know these dimensions, and there is no reason to call these functons in a loop
// store the values once
int cnt_P_cols = m_P.cols();
int cnt_TR_cols = m_TR.cols();
//
for(int Index = 1; Index <= n_t; ++Index)
{
// iterate over the columns in m_TR
for(int col_iter = 0; col_iter < cnt_TR_cols; ++col_iter)
{
// an accumulator for the vector multiplication
double sum = 0;
// The new value comes from the previous row (Index-1)
double orig_TR = m_TR(col_iter, Index-1);
// iterate over the columns in m_P corresponding to this Index
for(int p_iter = 0; p_iter < cnt_P_cols; ++p_iter)
{
// accumulate the value of this TR scalar * the m_P vector
sum += orig_TR * m_P(p_iter, Index);
}
m_TR(col_iter, Index) = sum;
}
}
}'
)
Can someone point me to where my logic is going wrong.