Creating sessions on clickstream data if the duration on a page is bigger than a certain value

177 Views Asked by At

I have click-stream data. Below, I have provided sample data for one user:

user_id  page   time   duration
1        A      12:15  5
1        B      12:21  3
1        C      12:25  22
1        D      12:48  5
1        B      12:54  2
1        A      12:57  5

What I want to do per user is if duration on a page is more than 22, then they should be identified as different sessions, which should be then displayed as different column, as follows for example for user #1:

user_id  page   time   duration   session
1        A      12:15  5          1
1        B      12:21  3          1
1        C      12:25  22         1
1        D      12:48  5          2
1        B      12:54  2          2
1        A      12:57  5          2

The same should be done for all users, creating sessions if the duration on a page is more than 20, and then naming them incrementally starting from 1. I honestly could not find any example to start from. I appreciate any guidance.

1

There are 1 best solutions below

0
On

We can calculate cumulative sum and divide it by 22

library(dplyr)

 output <- click-stream %>% group_by(user_id) 
 %>% mutate(csum = cumsum(duration)) 
 %>% mutate(rank= as.integer(csum / 22) + 1)

and the output would be

# Groups:   user_id [1]
 user_id page  time  duration  csum  rank
     <int> <fct> <fct>    <int> <int> <dbl>
1       1 A     12:15        5     5     1
2       1 B     12:21        3     8     1
3       1 C     12:25       22    30     2
4       1 D     12:48        5    35     2
5       1 B     12:54        2    37     2
6       1 A     12:57        5    42     2