Distance between points in GPX file becomes too large

744 Views Asked by At

I want to analyze distance traveled based on GPS tracks But when i calculate the distance it always comes out as too large.

I use python to make a csv file with the latitude and longitude for all points in a track which i then analyze with R. The data frame looks like this:

|      lat|      lon|   lat.p1|   lon.p1| dist_to_prev|
|--------:|--------:|--------:|--------:|------------:|
| 60.62061| 15.66640| 60.62045| 15.66660|    28.103099|
| 60.62045| 15.66660| 60.62037| 15.66662|     8.859034|
| 60.62037| 15.66662| 60.62026| 15.66636|    31.252373|
| 60.62026| 15.66636| 60.62018| 15.66636|     8.574722|
| 60.62018| 15.66636| 60.62010| 15.66650|    17.787905|
| 60.62001| 15.66672| 60.61996| 15.66684|    14.393267|
| 60.61996| 15.66684| 60.61989| 15.66685|     7.584996|
...

I could post the whole data frame here for reproducability, it's only 59 rows, but i'm not sure of the etiquette for posting big chunks of data here? Let me know how i can best share it.

lat.next and lon.next is just the lat and lon from the row below. dist_to_prev is calculated with distm() from geosphere:

library(geosphere)
library(dplyr)

df$dist_to_prev <- apply(df, 1 , FUN = function (row) { 
   distm(c(as.numeric(row["lat"]), as.numeric(row["lon"])), 
         c(as.numeric(row["lat.p1"]), as.numeric(row["lon.p1"])),
   fun = distHaversine)})

df %>% filter(dist_to_prev != "NA") %>% summarise(sum(dist_to_prev))

# A tibble: 1 x 1
`sum(dist_to_prev)`
            <dbl>
1           1266.

I took this track as an example from Trailforks and if you look at their track description it should be 787m, not 1266m as i got. This is not unique to this track but to all tracks i've looked at. When i do it they all come out 30-50% too long.

One thing that might be the cause is that there is only 5 decimal-places for the lats/lons. There is 6 decimal-places in the csv but i can only see 5 when i open it in Rstudio. I was thinking it was just formatting to make it easier to read and that the "whole" number was there but maybe not? The lat/lons are of type: double.

Why are my distances much larger than the ones displayed on the website i got the gpx-file from?

1

There are 1 best solutions below

4
On BEST ANSWER

There are couple of problems in the code above. The function distHaversine is a vectorized function thus you can avoid the loop / apply statement. This will significantly improve the performance.

Most important is with the geosphere package the first coordinate is longitude and not latitude.

df<- read.table(header =TRUE, text=" lat      lon   lat.p1   lon.p1
60.62061 15.66640 60.62045 15.66660
60.62045 15.66660 60.62037 15.66662
60.62037 15.66662 60.62026 15.66636
60.62026 15.66636 60.62018 15.66636
60.62018 15.66636 60.62010 15.66650
60.62001 15.66672 60.61996 15.66684
60.61996 15.66684 60.61989 15.66685")


library(geosphere)

#Lat is first column (incorrect)
distHaversine(df[,c("lat", "lon")], df[,c("lat.p1", "lon.p1")])
#incorrect
#[1] 28.103099  8.859034 31.252373  8.574722 17.787905 14.393267  7.584996

#Longitude is first (correct)
distHaversine(df[,c("lon", "lat")], df[,c("lon.p1", "lat.p1")])
#correct result.
#[1] 20.893456  8.972291 18.750046  8.905559 11.737448  8.598240  7.811479