Plot 20 lowest occurring sequences using TraMiner in R

204 Views Asked by At

While going through the documentation on TraMIner, I found that the easiest way to plot the top 10 frequently occurring sequences is with seqfplot. The below code is for mvad dataset.

  library(TraMineR)
  data("mvad")
  mvad.labels <- c("employment", "further education", "higher education", 
                   "joblessness", "school", "training")
  mvad.scode <- c("EM", "FE", "HE", "JL", "SC", "TR")

  mvad.seq <- seqdef(mvad, 17:86, states = mvad.scode, labels = mvad.labels)

  seqfplot(mvad.seq, withlegend = F, border = NA, title = "Sequence frequency
  plot")

I want to plot the 20 least frequently occurring sequences in a similar manner. Is there a convenient way to achieve this?

1

There are 1 best solutions below

0
On

Compute the number of different sequences and then use tlim to plot the last 20.

l <- length(seqtab(mvad.seq, tlim = 1:1000))

seqfplot(mvad.seq, withlegend = F, border = NA, title = "Sequence frequency
  plot", tlim=(l-20):l)

enter image description here

However, this does not really make sense: with most datasets, a large number of sequences will be unique and will have a frequency of 1/n. Examine the result of seqtab(mvad.seq, tlim = 1:1000). Which means that:

  1. The plot will show sequences with the same frequency: there is no point to plot it.
  2. The 20 "last" sequences are arbitrarily chosen among the many sequences of frequency 1/n : they are not the last ones, but a subset of them.