Trying to loop through a dataframe

65 Views Asked by At

I am trying to calculate the total activity driver using GPS data. I've written a loop that is intended to calculate the difference in time between two consecutive points in a dataframe over the range of values, summing it as it goes.

However, the final output is much smaller than would be expected, in the order of seconds instead of hundreds of hours, which leads me to believe that it is only looping a few times or not summing the values correctly. My programming knowledge is mostly from Python, am I implementing this idea correctly in R or could I write it better? My data looks something like this:

DriveNo       Date.and.Time Latitude Longitude
1     264 2014-02-01 12:12:05 41.91605  12.37186
2     264 2014-02-01 12:12:05 41.91605  12.37186
3     264 2014-02-01 12:12:12 41.91607  12.37221
4     264 2014-02-01 12:12:27 41.91619  12.37365
5     264 2014-02-01 12:12:42 41.91627  12.37490
6     264 2014-02-01 12:12:57 41.91669  12.37610

Is there a way I can save the result of each iteration to a list so that I could analyse where in the range of values a problem might be occurring?

datelist = taxi_264$Date.and.Time
dlstandard = as.POSIXlt(datelist)
diffsum = 0
for (i in range(1:83193))
{
  diff = difftime(dlstandard[i], dlstandard[(i+1)], units = "secs")
  diffsum = diffsum + diff
}
2

There are 2 best solutions below

2
On BEST ANSWER

You can try :

diffsum <- as.numeric(sum(difftime(tail(dlstandard, -1), 
                                   head(dlstandard, -1), units = 'secs')))

This will give diffsum as sum of the differences in seconds.

2
On

You could avoid the loop by using the lead() function from dplyr:

library(dplyr)

diff <- difftime(dlstandard, lead(dlstandard, 1, defaultValue=dlstandard), units="secs")
diffsum <- sum(diff)

Note that the above is a vectorized way of solving your problem, and is usually the way to go when using R.