I have three columns in my dataset that I work on for this question.
The first column is called buffer and shows whether a GPS point of an animal is inside the buffer zone (yes/no values). The second is datetime of the GPS point of the animal, the third is the time spent inside the buffer zone (dt1).
What I am trying to do is that if I have a "yes" row in the buffer column, between two "no" rows, I want to calculate the time difference between this gps point and the previous one and print it on dt1, which I managed.
The problem is when I try to calculate the time difference between the first and the last "yes" rows when looking at multiple consecutive "yes" rows, meaning that the animal stayed for more time in the buffer zone and thus consecutive GPS points are inside the buffer.
Here you can see my code. The problem is that it returns "NA" for "yes" rows that follow other "yes" rows, basically any "yes" row that is not isolated. I am trying to get the overall time difference in the final "yes" row in a "yes-row" series.
trips_with_buffer_2016_df <- trips_with_buffer_2016_df %>%
group_by(tripID) %>%
mutate(
dt1 = ifelse(buffer == 'yes',
ifelse(lag(buffer, default = 'no') == 'no',
difftime(DateTime, lag(DateTime), units = "mins"),
cumsum(as.numeric(difftime(DateTime, lag(DateTime,), units = "mins")))
),
NA_real_)
)
the "tripID" column groups the gps point by previously identified trips.
I know that the problem is in the cumsum line, but i cannot get it to work. The two rows with the problem showing Thanks a lot in advance!
You could simplify your indexing of in or out of buffer using
rle
, here 0 = out, 1 = in, but could be 'yes', 'no'Another way to think about it, and possibly easier to see what's happening in the future when things are all but forgotten, and an error pops up.