I'm trying to search sequences to find the most common substrings (I.E. subsequences where all events are adjacent). The user guide says the following about their subsequence searching tools:
"The idea of subsequence is an extension of the notion of substring and is described in detail for instance in Elzinga (2008). While a substring of a sequence is necessarily constituted of adjacent symbols, this requirement is relaxed with the notion of subsequence. Thus if x = abac, λ (the empty string), u = b, v = bac and w = bc belong to the set of subsequences of x, while only λ, u = b and v = bac are substrings of x"
Is there a way to turn off that relaxation, and only look at substrings? This is specifically using the seqefsub command. I can't find anything about this in the TraMineR manual, so any help on this is appreciated! Thanks so much, Andrew
Although
TraMineR
has no specific function for substrings, you can get substring-like results by playing with time constraints.For instance, setting
maxGap=1
in the constraint argument ofseqefsub
you get the frequent subsequences formed with events occurring at two successive time points. I illustrate below with theactcal
data shipping withTraMineR
.In that example, you get subsequences with events occurring at successive positions. To get subsequences of successive events independently of the time elapsed between them, define your event sequences with timestamps defined as successive numbers, e.g.
Hope this helps