r - viterbi RHmm Error protection stack overflow

1.6k Views Asked by At

I was looking for a HMM implementation in R to analyze states in a string of characters and the HMM library seems to run slow, then I am using the RHmm library.

My data is a string of 1953138 symbols (U,D,N)

this is a sample of my data:

string <- sample(c("D","U","N"),1953138,replace=T)

fitting HMM

HMM <- HMMFit(string,dis="DISCRETE",nStates=3)$HMM

running viterbi and here is where I get the error

viterbi_results <- viterbi(HMM,string)
#Error: protect(): protection stack overflow

However if I use only a subset of the string viterbi() works just fine:

viterbi_results <- viterbi(HMM,string[1:49963])

Actually if I try to run:

viterbi_results <- viterbi(HMM,string[1:49964])
#Error: protect(): protection stack overflow

I get the same stack overflow error, then 49964 elements in the vector is the limit

I think the problem may be related to the fact that the default R option of --max-ppsize is 50000, but changing this parameter to it's limit --max-ppsize 500000 does not #fix the problem. However the vector limit in viterbi() does increases, it goes from 49964 elements to somewhere around 499960 elements in the string vector.

I tried to run the viterbi algorithm in chunks. First I splited the string into chunks of 49960 elements and apply viterbi to each of them, but I got the same error

list_string <- split(string, ceiling(seq_along(string)/49960))

viterbi_results <- lapply(list_string,function(x) viterbi(HMM,x)$states)
#Error: protect(): protection stack overflow

Here in stackoverflow I found a similar problem to the one I am having LINK. Apparently the source of the problem was a PROTECT inside a loop that isn't needed. I jumped into the c++ source code of the viterbi function but there is not a single PROTECT.

I also tried ulimit -s unlimited , but I am getting the same error.

I am working on unix with 1009 GB of RAM memory

link to the RHmm package

Thanks a lot for the help!

1

There are 1 best solutions below

1
On

increase point stack size when R starts:

R --max-pp-size=100000