I was looking for a HMM implementation in R to analyze states in a string of characters and the HMM library seems to run slow, then I am using the RHmm library.
My data is a string of 1953138 symbols (U,D,N)
this is a sample of my data:
string <- sample(c("D","U","N"),1953138,replace=T)
fitting HMM
HMM <- HMMFit(string,dis="DISCRETE",nStates=3)$HMM
running viterbi and here is where I get the error
viterbi_results <- viterbi(HMM,string)
#Error: protect(): protection stack overflow
However if I use only a subset of the string viterbi() works just fine:
viterbi_results <- viterbi(HMM,string[1:49963])
Actually if I try to run:
viterbi_results <- viterbi(HMM,string[1:49964])
#Error: protect(): protection stack overflow
I get the same stack overflow error, then 49964 elements in the vector is the limit
I think the problem may be related to the fact that the default R option of --max-ppsize is 50000, but changing this parameter to it's limit --max-ppsize 500000 does not #fix the problem. However the vector limit in viterbi() does increases, it goes from 49964 elements to somewhere around 499960 elements in the string vector.
I tried to run the viterbi algorithm in chunks. First I splited the string into chunks of 49960 elements and apply viterbi to each of them, but I got the same error
list_string <- split(string, ceiling(seq_along(string)/49960))
viterbi_results <- lapply(list_string,function(x) viterbi(HMM,x)$states)
#Error: protect(): protection stack overflow
Here in stackoverflow I found a similar problem to the one I am having LINK. Apparently the source of the problem was a PROTECT inside a loop that isn't needed. I jumped into the c++ source code of the viterbi function but there is not a single PROTECT.
I also tried ulimit -s unlimited
, but I am getting the same error.
I am working on unix with 1009 GB of RAM memory
link to the RHmm package
Thanks a lot for the help!
increase point stack size when R starts: