I want to analyze distance traveled based on GPS tracks But when i calculate the distance it always comes out as too large.
I use python to make a csv file with the latitude and longitude for all points in a track which i then analyze with R. The data frame looks like this:
| lat| lon| lat.p1| lon.p1| dist_to_prev|
|--------:|--------:|--------:|--------:|------------:|
| 60.62061| 15.66640| 60.62045| 15.66660| 28.103099|
| 60.62045| 15.66660| 60.62037| 15.66662| 8.859034|
| 60.62037| 15.66662| 60.62026| 15.66636| 31.252373|
| 60.62026| 15.66636| 60.62018| 15.66636| 8.574722|
| 60.62018| 15.66636| 60.62010| 15.66650| 17.787905|
| 60.62001| 15.66672| 60.61996| 15.66684| 14.393267|
| 60.61996| 15.66684| 60.61989| 15.66685| 7.584996|
...
I could post the whole data frame here for reproducability, it's only 59 rows, but i'm not sure of the etiquette for posting big chunks of data here? Let me know how i can best share it.
lat.next and lon.next is just the lat and lon from the row below. dist_to_prev is calculated with distm() from geosphere:
library(geosphere)
library(dplyr)
df$dist_to_prev <- apply(df, 1 , FUN = function (row) {
distm(c(as.numeric(row["lat"]), as.numeric(row["lon"])),
c(as.numeric(row["lat.p1"]), as.numeric(row["lon.p1"])),
fun = distHaversine)})
df %>% filter(dist_to_prev != "NA") %>% summarise(sum(dist_to_prev))
# A tibble: 1 x 1
`sum(dist_to_prev)`
<dbl>
1 1266.
I took this track as an example from Trailforks and if you look at their track description it should be 787m, not 1266m as i got. This is not unique to this track but to all tracks i've looked at. When i do it they all come out 30-50% too long.
One thing that might be the cause is that there is only 5 decimal-places for the lats/lons. There is 6 decimal-places in the csv but i can only see 5 when i open it in Rstudio. I was thinking it was just formatting to make it easier to read and that the "whole" number was there but maybe not? The lat/lons are of type: double.
Why are my distances much larger than the ones displayed on the website i got the gpx-file from?
There are couple of problems in the code above. The function
distHaversine
is a vectorized function thus you can avoid the loop / apply statement. This will significantly improve the performance.Most important is with the geosphere package the first coordinate is longitude and not latitude.