I am having hard time understanding the difference between max.gap and window.size and how they work.
Let's say I have the following sequence: 947-(SP6)-992-(CP2)-2-(SP6)-4-(SP10)
, where the numbers between events indicate the minutes (4 minutes between SP6 and SP10).
With max.gap=2
constraint, I get the following results (although I expected to have only (CP2)-(SP6)
in the results because they have -2-
between them):
> seqefsub(peer_data.seqe[30], min.support = 1, constraint = seqeconstraint(max.gap = 2))
Subsequence Support Count
1 (CP2) 1 1
2 (CP2)-(SP6) 1 1
3 (CP2)-(SP6)-(SP10) 1 1
4 (SP10) 1 1
5 (SP6) 1 1
6 (SP6)-(SP10) 1 1
I do not understand why do I have (SP6)-(SP10)
in the results. Here, how window.size
would change the things? I appreciate if someone explain this clearly. I am using this for my research and I do not want to use it incorrectly.
The
max.gap=k
condition means that we search for subsequences with at most k units of time between two successive events in the subsequence.The
window.size=w
condition means that we search for subsequences with a duration between the first and last events that does not exceed w.Thus
max.gap
refers to the time between successive events in the subsequence andwindow.size
to the total duration of the subsequence.I illustrate with your example sequence.
As you can see, with
max.gap=2
we get the subsequences with a single event and the subsequence(cp2)-(sp6)
becausesp6
occurs 2 minutes aftercp2
. Any other subsequence would have at least a gap greater than 2 between two successive events. (This outcome does not correspond to yours, which let's me think thatpeer_data.seqe[30]
is not the shown example sequence).Now, using
window.size=6
, we get three more subsequences.seqefsub(eseq, min.support = 1, constraint = seqeconstraint(window.size = 6))
In particular
(cp2)-(sp6)-(sp10)
has a total duration of 6 and the total time between the two events of(cp2)-(sp10)
is also 6. Reducing the window.size would eliminate these two sequences. Likewise,(sp6)-(sp10)
would be eliminated with a window size smaller than 4.As a last example, I combine
window.size=6
withmax.gap=4
.We get here one subsequence less than in the previous example, namely
(cp2)-(sp10)
because there is a gap of 6 minutes between the two events.