Creating clickstream sequence from data frame

43 Views Asked by At

I have the following table:

User ID Session ID Time Stamp Page
123 123.4 HH:MM:01 1
123 123.4 HH:MM:02 2
123 123.4 HH:MM:05 3
123 123.4 HH:MM:10 4
123 123.4 HH:MM:11 5
122 1299.1 HH:MM:01 2
122 1299.1 HH:MM:02 3
128 124.4 HH:MM:01 1
128 124.4 HH:MM:02 2

And, I am trying to get a table like this:

User ID Session ID Time Stamp Page Next Page Duration (secs)
123 123.4 HH:MM:01 1 2 1
123 123.4 HH:MM:02 2 3 3
123 123.4 HH:MM:05 3 4 5
123 123.4 HH:MM:10 4 5 1
122 1299.1 HH:MM:01 2 3 1
128 124.4 HH:MM:01 1 2 1

I know I have to use lead(), but I cannot figure out how to use the mutate and lead function together.

Any help is much appreciated. Thank you.

1

There are 1 best solutions below

0
guasi On

Here is a possibility using dplyr. It's not very elegant, perhaps someone else can come up with something more succinct.

library(dplyr)

df <- tribble(
~ ID, ~ Session, ~ Timestamp, ~ Pages,
123, 123.4,  01, 1,
123, 123.4,  02, 2,
123, 123.4,  05, 3,
123, 123.4,  10, 4,
123, 123.4,  11, 5,
122, 1299.1, 01, 2,
122, 1299.1, 02, 3,
128, 124.4,  01, 1,
128, 124.4,  02, 2)


df %>% 
  group_by(ID, Session) %>% 
  mutate(Duration = lead(Timestamp) - Timestamp) %>% 
  na.omit()

-output

# A tibble: 6 × 5
     ID Session Timestamp Pages Duration
  <dbl>   <dbl>     <dbl> <dbl>    <dbl>
1   123    123.         1     1        1
2   123    123.         2     2        3
3   123    123.         5     3        5
4   123    123.        10     4        1
5   122   1299.         1     2        1
6   128    124.         1     1        1