How to subsample time series (bursts of GPS locations)

188 Views Asked by At

I have a time series as below:

**Date_time**
2018-06-26 17:19:30
2018-06-26 17:20:40
2018-06-26 17:20:41
2018-06-26 17:20:42
[...]
2018-06-26 17:21:36
2018-06-26 17:21:37
2018-06-26 17:21:38
2018-06-26 17:21:39
2018-06-26 17:23:15

I would like to subsample it such as I obtained the following time series (i.e. removing locations recorded every second such as to keep only 1 location / minute roughly)

**Date_time**
2018-06-26 17:19:30
2018-06-26 17:20:40
2018-06-26 17:21:39
2018-06-26 17:23:15

I wrote the following code (but I do not get the expected time series)

tab_subsampled <- tab %>%
   mutate(Date_Time = ymd_hms(Date_Time), 
          year = year(Date_Time), month = month(Date_Time), day = day(Date_Time), 
          hour = hour(Date_Time), minute = minute(Date_Time), second = second(Date_Time)) %>% 
   group_by(year, month, day, hour, minute) %>%
   slice(n()) %>% 
   ungroup() 

I'd really appreciate some help, thank you very much!

2

There are 2 best solutions below

1
On

You can use substr with dplyr on the whole df. Then you can cut of everything after the minutes and then only allow unique values so you only have one data point per minute.

library(dplyr)

#Date_time
time<-c("2018-06-26 17:19:30",
        "2018-06-26 17:20:40",
        "2018-06-26 17:20:41",
        "2018-06-26 17:20:42",
        "2018-06-26 17:21:39",
        "2018-06-26 17:23:15")

time<-as.data.frame(time)
colnames(time) = ("Date_time")

time<-time %>%
  mutate(Date_time = substr(Date_time, 1, 13))

Date.Time_only_minutes<-unique(time$Date_time);Date.Time_only_minutes
3
On

Simply sample_n will also do

library(lubridate)

time<-c("2018-06-26 17:19:30",
        "2018-06-26 17:20:40",
        "2018-06-26 17:20:41",
        "2018-06-26 18:20:42",
        "2018-06-26 17:21:39",
        "2018-06-26 17:23:15",
        "2018-07-26 17:20:30",
        "2018-07-26 17:20:40",
        "2018-08-26 18:20:41",
        "2018-08-26 18:20:42",
        "2018-09-26 17:21:39",
        "2018-09-26 17:21:15")

time<-as.data.frame(time)
                  time
1  2018-06-26 17:19:30
2  2018-06-26 17:20:40
3  2018-06-26 17:20:41
4  2018-06-26 18:20:42
5  2018-06-26 17:21:39
6  2018-06-26 17:23:15
7  2018-07-26 17:20:30
8  2018-07-26 17:20:40
9  2018-08-26 18:20:41
10 2018-08-26 18:20:42
11 2018-09-26 17:21:39
12 2018-09-26 17:21:15


set.seed(1)
time %>% group_by(date(time), hour(time), minute(time)) %>%
  sample_n(1) %>% ungroup() %>%
  select(time)
# A tibble: 8 x 1
  time               
  <chr>              
1 2018-06-26 17:19:30
2 2018-06-26 17:20:41
3 2018-06-26 17:21:39
4 2018-06-26 17:23:15
5 2018-06-26 18:20:42
6 2018-07-26 17:20:30
7 2018-08-26 18:20:41
8 2018-09-26 17:21:39

Note, you have to added your other ID/grouping variables in group_by statements to do it along those groups.