How to calculate time difference of two columns with a lag

1k Views Asked by At

I am currently facing a dataset of taxi trips by a driver in NYC. I got the driver ID as well as the pickup date and time and dropoff date and time for every trip. Now I want to calculate the waiting time between the dropoff time of the last trip and the pickup time of the new trip. Therefore I have to calculate the time difference between two columns with one Lag (because dropoff time refers to the last trip and pickup time to the next trip (next column)) grouped by driver ID (to make sure I am not calculating the time difference between trips of two different drivers).

A possible data set looks like this:

hack_license = c("303F79923DA5DA7A10DF15E2D91CDCF7","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","ABE23CA71E2DE84972281BA1C70B6EBB","ABE23CA71E2DE84972281BA1C70B6EBB","BA83D7C383EAA4F9D78A1A8B83CB3E92","BA83D7C383EAA4F9D78A1A8B83CB3E92","D476A1872F1F6594BD638C274483ED06","D476A1872F1F6594BD638C274483ED06")

pickup_datetime = c("2013-12-31 23:01:07","2013-12-31 23:04:00","2013-12-31 23:31:00","2013-12-31 23:40:00","2013-12-31 23:16:39","2013-12-31 23:24:05","2013-12-31 23:09:10","2013-12-31 23:26:26","2013-12-31 23:13:00","2013-12-31 23:22:00")

dropoff_datetime = c("2013-12-31 23:20:33","2013-12-31 23:28:00","2013-12-31 23:33:00","2013-12-31 23:48:00","2013-12-31 23:22:29","2013-12-31 23:28:37","23:21:24","2013-12-31 23:36:54","2013-12-31 23:20:00","2013-12-31 23:27:00")

data <- data.frame(hack_license,pickup_datetime,dropoff_datetime)

I tried to use dplyr and lubridate like this, but it doesn't work.

data %>%
group_by(data$hack_license) %>%
  group_by(hack_license) %>%
  mutate(waiting_time_in_secs = difftime(pickup_datetime,                                       
lag(dropoff_datetime), units = 'secs'))

Maybe some of you can help me out here. Would be great!

1

There are 1 best solutions below

9
On BEST ANSWER

You can create a datetime column for both pickup and dropoff and for each hack_license calculate the difference in time between the current pickup time and previous drop off time.

library(dplyr)
library(lubridate)

data <- data %>%
          mutate(pickup_datetime = ymd_hms(pickup_datetime), 
                 dropoff_datetime = ymd_hms(dropoff_datetime)) %>%
           group_by(hack_license) %>%
           mutate(waiting_time_in_secs = as.numeric(difftime(pickup_datetime, 
                                lag(dropoff_datetime), units = 'secs')))
data
#   hack_license                     pickup_datetime     dropoff_datetime    waiting_time_in_secs
#   <chr>                            <dttm>              <dttm>                             <dbl>
# 1 303F79923DA5DA7A10DF15E2D91CDCF7 2013-12-31 23:01:07 2013-12-31 23:20:33                   NA
# 2 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:04:00 2013-12-31 23:28:00                   NA
# 3 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:31:00 2013-12-31 23:33:00                  180
# 4 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:40:00 2013-12-31 23:48:00                  420
# 5 ABE23CA71E2DE84972281BA1C70B6EBB 2013-12-31 23:16:39 2013-12-31 23:22:29                   NA
# 6 ABE23CA71E2DE84972281BA1C70B6EBB 2013-12-31 23:24:05 2013-12-31 23:28:37                   96
# 7 BA83D7C383EAA4F9D78A1A8B83CB3E92 2013-12-31 23:09:10 2013-12-31 23:21:24                   NA
# 8 BA83D7C383EAA4F9D78A1A8B83CB3E92 2013-12-31 23:26:26 2013-12-31 23:36:54                  302
# 9 D476A1872F1F6594BD638C274483ED06 2013-12-31 23:13:00 2013-12-31 23:20:00                   NA
#10 D476A1872F1F6594BD638C274483ED06 2013-12-31 23:22:00 2013-12-31 23:27:00                  120