I am currently facing a dataset of taxi trips by a driver in NYC. I got the driver ID as well as the pickup date and time and dropoff date and time for every trip. Now I want to calculate the waiting time between the dropoff time of the last trip and the pickup time of the new trip. Therefore I have to calculate the time difference between two columns with one Lag (because dropoff time refers to the last trip and pickup time to the next trip (next column)) grouped by driver ID (to make sure I am not calculating the time difference between trips of two different drivers).
A possible data set looks like this:
hack_license = c("303F79923DA5DA7A10DF15E2D91CDCF7","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","ABE23CA71E2DE84972281BA1C70B6EBB","ABE23CA71E2DE84972281BA1C70B6EBB","BA83D7C383EAA4F9D78A1A8B83CB3E92","BA83D7C383EAA4F9D78A1A8B83CB3E92","D476A1872F1F6594BD638C274483ED06","D476A1872F1F6594BD638C274483ED06")
pickup_datetime = c("2013-12-31 23:01:07","2013-12-31 23:04:00","2013-12-31 23:31:00","2013-12-31 23:40:00","2013-12-31 23:16:39","2013-12-31 23:24:05","2013-12-31 23:09:10","2013-12-31 23:26:26","2013-12-31 23:13:00","2013-12-31 23:22:00")
dropoff_datetime = c("2013-12-31 23:20:33","2013-12-31 23:28:00","2013-12-31 23:33:00","2013-12-31 23:48:00","2013-12-31 23:22:29","2013-12-31 23:28:37","23:21:24","2013-12-31 23:36:54","2013-12-31 23:20:00","2013-12-31 23:27:00")
data <- data.frame(hack_license,pickup_datetime,dropoff_datetime)
I tried to use dplyr and lubridate like this, but it doesn't work.
data %>%
group_by(data$hack_license) %>%
group_by(hack_license) %>%
mutate(waiting_time_in_secs = difftime(pickup_datetime,
lag(dropoff_datetime), units = 'secs'))
Maybe some of you can help me out here. Would be great!
You can create a datetime column for both pickup and dropoff and for each
hack_license
calculate the difference in time between the current pickup time and previous drop off time.