Events in last 21 days for every row by Name

151 Views Asked by At

This is what my dataframe looks like. The two rightmost columns are my desired columns.These two columns check the condition whether in the last 21 days there is an "Email" ActivityType and whether in the last 21 days there is a "Webinar" ActivityType.

 Name      ActivityType     ActivityDate  Email(last21days) Webinar(last21day)**             
John       Email            1/1/2014        TRUE                  NA   
John       Webinar          1/5/2014        TRUE                 TRUE
John       Sale             1/20/2014       TRUE                 TRUE
John       Webinar          3/25/2014       NA                   TRUE
John       Sale             4/1/2014        NA                   TRUE
John       Sale             7/1/2014        NA                   NA
Tom        Email            1/1/2015        TRUE                   NA   
Tom        Webinar          1/5/2015        TRUE                 TRUE
Tom        Sale             1/20/2015      TRUE                 TRUE
Tom        Webinar          3/25/2015       NA                   TRUE
Tom        Sale              4/1/2015        NA                   TRUE
Tom        Sale              7/1/2015       NA                   NA

Based on the help here: Extracting event types from last 21 day window I tried:

df$ActivityDate <- as.Date(df$ActivityDate)
library(data.table)
setDT(df)
setkey(df, Name,ActivityDate)
Elsetemp <- df[, .(Name, ActivityDate, ActivityType)]
df[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")), 
                        Webinar21 = as.logical(which(i.ActivityType == "Webinar"))), 
         roll = -21, by = .EACHI]

to no avail as I only get TRUEs for rows with "Sale". For example, second row where ActivityType = Webinar, both Email21 & Webinar21 should say TRUE. When I am defining last 21 days, I am trying to include that very day when the event happened too.

2

There are 2 best solutions below

5
On BEST ANSWER

How about this?

Using rolling joins from data.table:

require(data.table)
dt[, ActivityDate := as.Date(ActivityDate, format="%m/%d/%Y")]
setkey(dt, Name, ActivityDate)

roll_index <- function(x, types, roll=21) {
    lapply(types, function(type) {
         idx = x[ActivityType == type][x, roll=roll, which=TRUE]
         as.logical(idx)
    })
}
dt[, c("Email_21", "Webinar_21") := roll_index(dt, c("Email", "Webinar"))]

#     Name ActivityType ActivityDate Email_21 Webinar_21
#  1: John        Email   2014-01-01     TRUE         NA
#  2: John      Webinar   2014-01-05     TRUE       TRUE
#  3: John         Sale   2014-01-20     TRUE       TRUE
#  4: John      Webinar   2014-03-25       NA       TRUE
#  5: John         Sale   2014-04-01       NA       TRUE
#  6: John         Sale   2014-07-01       NA         NA
#  7:  Tom        Email   2015-01-01     TRUE         NA
#  8:  Tom      Webinar   2015-01-05     TRUE       TRUE
#  9:  Tom         Sale   2015-01-20     TRUE       TRUE
# 10:  Tom      Webinar   2015-03-25       NA       TRUE
# 11:  Tom         Sale   2015-04-01       NA       TRUE
# 12:  Tom         Sale   2015-07-01       NA         NA
2
On

A base R solution:

#New type of sequence function that can accept vectors
seq2 <- function(v1) {
  res <- list()
  for(i in seq_along(v1)) {
    res[[i]] <- seq(v1[i], v1[i]+21, by='day')
  }
  as.Date(unlist(res), origin='1970-01-01')
}

df <- df[ ,1:3]
df$ActivityDate <- as.Date(df$ActivityDate, format='%m/%d/%Y')

#Email column
emailed <- df[df$ActivityType == 'Email', 'ActivityDate']
df$Email <- df$ActivityDate %in% seq2(emailed)

#Webinar column
webbed <- df[df$ActivityType == 'Webinar', 'ActivityDate']
df$Webinar <- df$ActivityDate %in% seq2(webbed)

First we subset the first three columns without the example output. Then convert the date factors with as.Date. The vector emailed looks for ActivityType with the Email string. The function seq2 was created to look for the date and 21 days after. It creates a sequence that can be checked against.

df
#    Name ActivityType ActivityDate Email Webinar
# 1  John        Email   2014-01-01  TRUE   FALSE
# 2  John      Webinar   2014-01-05  TRUE    TRUE
# 3  John         Sale   2014-01-20  TRUE    TRUE
# 4  John      Webinar   2014-03-25 FALSE    TRUE
# 5  John         Sale   2014-04-01 FALSE    TRUE
# 6  John         Sale   2014-07-01 FALSE   FALSE
# 7   Tom        Email   2015-01-01  TRUE   FALSE
# 8   Tom      Webinar   2015-01-05  TRUE    TRUE
# 9   Tom         Sale   2015-01-20  TRUE    TRUE
# 10  Tom      Webinar   2015-03-25 FALSE    TRUE
# 11  Tom         Sale   2015-04-01 FALSE    TRUE
# 12  Tom         Sale   2015-07-01 FALSE   FALSE

Data

df <- read.table(text=' Name      ActivityType     ActivityDate  Email(last21days) Webinar(last21day)**             
John       Email            1/1/2014        TRUE                  NA   
John       Webinar          1/5/2014        TRUE                 TRUE
John       Sale             1/20/2014       TRUE                 TRUE
John       Webinar          3/25/2014       NA                   TRUE
John       Sale             4/1/2014        NA                   TRUE
John       Sale             7/1/2014        NA                   NA
Tom        Email            1/1/2015        TRUE                   NA   
Tom        Webinar          1/5/2015        TRUE                 TRUE
Tom        Sale             1/20/2015      TRUE                 TRUE
Tom        Webinar          3/25/2015       NA                   TRUE
Tom        Sale              4/1/2015        NA                   TRUE
Tom        Sale              7/1/2015       NA                   NA', header=T)