I'm conducting a multichannel sequence analysis with 3 channels, for which I have defined three separate substitution cost matrices as the basis for Optimal Matching.
I get the following error message, when using seqMD:
> MDcost <- seqMD(channels, method = OM, sm=smatrix, what="cost", norm = "auto",
+ indel = "auto", with.missing = TRUE)
[!!] 3 domains with 288 sequences
[>] building MD sequences of combined states... OK
[>] including missing value as an additional state
[>] checking 'sm' (size and triangle inequality)
Error: [!] size of substitution cost matrix must be 8x8
The substitution cost matrix for the first channel is 7x7. I assume the error is related to this message in the output:
including missing value as an additional state
But there are no missings in my channels. I used seqdef with left = DEL for sequences in each channel as my sequences are of varying length across my units.
I have no issues when using seqdist for the monochannels:
> ch1.dist.OM.th <- seqdist(ch1.seq, method = "OM", indel = "auto",
+ sm = ch1.sub.th, norm = "auto")
[>] 288 sequences with 7 distinct states
[>] checking 'sm' (size and triangle inequality)
[>] 265 distinct sequences
[>] min/max sequence lengths: 2/48
[>] computing distances using the OM maxlength normalized metric
[>] elapsed time: 0.052 secs
Here, there is no indication of missing values/added states and the substitution matrices work fine. I'm grateful for any hints to solve this problem.
23-03-02 UPDATE Here is the output for three UoO for the 3 channels
> MDseq <- seqMD(channels)
[!!] 3 domains with 3 sequences
[>] building MD sequences of combined states... OK
> seqlength(MDseq)
Length
1 31
2 18
3 17
> MDseq
Sequence
1 P09+R00+G00-P09+R00+G00-P07+R00+G00-P07+R00+G00-P07+R00+G00-P07+R00+G00-
P07+R00+G00-P07+R00+G00-P07+R00+G00-P07+R00+G00-P07+R00+G00-P07+R00+G00-
P07+R05+G00-P07+R05+G00-P07+R05+G00-P07+R05+G00-P07+R05+G00-P07+R05+G00-
P07+R04+G00-P07+R04+G00-P07+R04+G00-P07+R04+G00-P07+R04+G00-P07+R03+G00-
P07+R03+G00-P07+R03+G00-P07+R03+G00-P07+R05+G02-P04+R05+G02-P04+R05+G02-
P04+R05+G02
2 P09+R00+G00-P09+R00+G00-P09+R00+G00-P05+R00+G00-P05+R14+G00-P05+R10+G00-
P05+R10+G00-P05+R10+G00-P05+R10+G00-P05+R10+G00-P05+R09+G00-P05+R09+G00-
P05+R09+G00-P05+R09+G00-P05+R04+G00-P05+R05+G00-P05+R05+G00-P04+R04+G00
3 P09+R00+G00-P07+R00+G00-P07+R00+G00-P07+R00+G00-P07+R00+G00-P07+R00+G00-
P07+R11+G00-P07+R11+G05-P07+R11+G05-P07+R11+G05-P05+R11+G05-P05+R08+G00-
P05+R08+G00-P04+R08+G00-P04+R08+G00-P04+R08+G00-P04+R08+G00
> seqstatl(MDseq)
[1] "%" "P04+R04+G00" "P04+R05+G02" "P04+R08+G00" "P05+R00+G00"
[6] "P05+R04+G00" "P05+R05+G00" "P05+R08+G00" "P05+R09+G00" "P05+R10+G00"
[11] "P05+R11+G05" "P05+R14+G00" "P07+R00+G00" "P07+R03+G00" "P07+R04+G00"
[16] "P07+R05+G00" "P07+R05+G02" "P07+R11+G00" "P07+R11+G05" "P09+R00+G00"
> MDcost <- seqMD(channels, method = OM, sm=smatrix, what="cost", norm = "auto",
+ indel = "auto")
[!!] 3 domains with 3 sequences
[>] building MD sequences of combined states... OK
[>] including missing value as an additional state
[>] checking 'sm' (size and triangle inequality)
Error: [!] size of substitution cost matrix must be 8x8
> smatrix[[1]]
P04 P05 P06 P07 P08 P09 P00
P04 0 1 2 3 4 5 6
P05 1 0 1 2 3 4 5
P06 2 1 0 1 2 3 4
P07 3 2 1 0 1 2 3
P08 4 3 2 1 0 1 2
P09 5 4 3 2 1 0 1
P00 6 5 4 3 2 1 0
This is a bug in
seqMD. I cannot run your example because you do not provide the data. However, I think I have identified the issue. The error occurs because of a badly specified test inseqMD, which setswith.missingasTRUEwhen sequences do not all have the same length as in your data.This will be fixed within two days in the development version available on R-Forge (now released as v 2.2-7 on the CRAN).
In the meantime, the only workaround I see is to make all sequences of the same length by setting right missings as true missings, i.e., by specifying
right=NAin theseqdefcommands. I illustrate below with a small example:Now we run
seqMD, first to show the multidomain (MD) sequences, and then to compute CAT costs: