While going through the documentation on TraMIner
, I found that the easiest way to plot the top 10 frequently occurring sequences is with seqfplot
. The below code is for mvad
dataset.
library(TraMineR)
data("mvad")
mvad.labels <- c("employment", "further education", "higher education",
"joblessness", "school", "training")
mvad.scode <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad.seq <- seqdef(mvad, 17:86, states = mvad.scode, labels = mvad.labels)
seqfplot(mvad.seq, withlegend = F, border = NA, title = "Sequence frequency
plot")
I want to plot the 20 least frequently occurring sequences in a similar manner. Is there a convenient way to achieve this?
Compute the number of different sequences and then use tlim to plot the last 20.
However, this does not really make sense: with most datasets, a large number of sequences will be unique and will have a frequency of 1/n. Examine the result of
seqtab(mvad.seq, tlim = 1:1000)
. Which means that: