How to split datetime period data into monthyear based on unique values in a column in R

83 Views Asked by At

I have a series of receivers that were deployed for various periods. The dataset looks like this:

Receiver start_dt end_dt
1 2021-05-19 15:43:00 2022-06-19 12:43:00
2 2021-08-19 15:43:00 2022-05-10 18:43:00
3 2021-12-19 15:43:00 2022-06-19 12:43:00

I would like to extract each monthyear that each receiver was deployed within the start and end dates such that each receiver is linked to each of it's respective monthyear's it was active. Here is an example of what I'm trying to achieve:

Receiver monthyear
1 2021-05
1 2021-06
1 2021-07
1 2021-08
1 2021-09
1 2021-10
1 2021-11
1 2021-12
1 2022-01
1 2022-01
1 2022-02
1 2022-03
1 2022-04
1 2022-05
1 2022-06
2 2021-08
2 2021-09
2 2021-10
2 2021-11
2 2021-12
2 2022-01
2 2022-02
2 2022-03
2 2022-04
2 2022-05
3 2021-12
3 2022-01
3 2022-02
3 2022-03
3 2022-04
3 2022-05
3 2022-06

I am not sure if lubridate is the best option here? In my head it seemed relatively simple but I can't seem to make it work with it or using other examples on stack.

Thank you for any help.

4

There are 4 best solutions below

1
Jon Spring On

reframe lets us do most of the work in one line.

Here, I convert (if not already) the two datetime columns to datetime (receiving data in year-month-day-hour-minute-second order, hence ymd_hms), convert those to date with as_date, and convert those to the first of the respective month. Then we can use reframe to make a sequence of monthly rows for each Receiver's range of months.

library(lubridate); library(dplyr)
df |>
  mutate(across(ends_with("dt"), ~floor_date(as_date(ymd_hms(.)), "month"))) |>
  reframe(monthyear = seq.Date(start_dt, end_dt, "month"), .by = Receiver)

Result (if you like, you could convert to tsibble::yearmonth or convert to a string YYYY-MM with mutate(monthyear = format(monthyear, "%Y-%m"))

   Receiver  monthyear
1         1 2021-05-01
2         1 2021-06-01
3         1 2021-07-01
4         1 2021-08-01
5         1 2021-09-01
6         1 2021-10-01
7         1 2021-11-01
8         1 2021-12-01
9         1 2022-01-01
10        1 2022-02-01
11        1 2022-03-01
12        1 2022-04-01
13        1 2022-05-01
14        1 2022-06-01
15        2 2021-08-01
16        2 2021-09-01
17        2 2021-10-01
18        2 2021-11-01
19        2 2021-12-01
20        2 2022-01-01
21        2 2022-02-01
22        2 2022-03-01
23        2 2022-04-01
24        2 2022-05-01
25        3 2021-12-01
26        3 2022-01-01
27        3 2022-02-01
28        3 2022-03-01
29        3 2022-04-01
30        3 2022-05-01
31        3 2022-06-01

      
1
TarJae On

We create and expand a sequence of months rowwise between start_dt and end_dt. With format(monthyear, "%Y-%m") we get the month and year.

library(dplyr)
library(tidyr)

df %>%
  mutate(across(ends_with("dt"), ~ymd_hms(.))) %>% # if you already have datetime format you don't need this line
  rowwise() %>%
  mutate(monthyear = list(seq(ceiling_date(start_dt, "month"), 
                              floor_date(end_dt, "month"), 
                              by = "month"))) %>%
  unnest(monthyear) %>%
  mutate(monthyear = format(monthyear, "%Y-%m")) %>%
  select(Receiver, monthyear) %>% 
  print(n=50)
  Receiver monthyear
      <int> <chr>    
 1        1 2021-06  
 2        1 2021-07  
 3        1 2021-08  
 4        1 2021-09  
 5        1 2021-10  
 6        1 2021-11  
 7        1 2021-12  
 8        1 2022-01  
 9        1 2022-02  
10        1 2022-03  
11        1 2022-04  
12        1 2022-05  
13        1 2022-06  
14        2 2021-09  
15        2 2021-10  
16        2 2021-11  
17        2 2021-12  
18        2 2022-01  
19        2 2022-02  
20        2 2022-03  
21        2 2022-04  
22        2 2022-05  
23        3 2022-01  
24        3 2022-02  
25        3 2022-03  
26        3 2022-04  
27        3 2022-05  
28        3 2022-06  
0
Limey On

An alternative approach using list-columns and resulting a slightly more compact solution.

library(libridate)
library(tidyverse)

# Create test data
df <- tibble(
  receiver = 1:3,
  start_dt = floor_date(as_datetime(c("2021-05-19 15:43:00", "2021-08-19 15:43:00", "2021-12-19 15:43:00")), "month"), 
  end_dt = ceiling_date(as_datetime(c("2022-06-19 12:43:00", "2022-05-10 18:43:00", "2022-06-19 12:43:00")), "month")
)

# Solution
df %>% 
  rowwise() %>% 
  mutate(monthyear = list(seq(start_dt, end_dt, "1 month"))) %>% 
  select(receiver, monthyear) %>% 
  unnest(monthyear)
# A tibble: 34 × 2
   receiver monthyear          
      <int> <dttm>             
 1        1 2021-05-01 00:00:00
 2        1 2021-06-01 00:00:00
 3        1 2021-07-01 00:00:00
 4        1 2021-08-01 00:00:00
 5        1 2021-09-01 00:00:00
 6        1 2021-10-01 00:00:00
 7        1 2021-11-01 00:00:00
 8        1 2021-12-01 00:00:00
 9        1 2022-01-01 00:00:00
10        1 2022-02-01 00:00:00
# ℹ 24 more rows
0
M.Viking On

Tried to make a base R one-liner, but can't get the receiver in one go. Figuring out how to always include the end month in base R is a challenge. and the below is using lubridate::months(1), what a pain.

df <- data.frame(receiver = 1:3,
                 start_dt = as.POSIXct(c("2021-05-19 15:43:00", "2021-08-19 15:43:00", "2021-12-19 15:43:00")), 
                 end_dt   = as.POSIXct(c("2022-06-19 12:43:00", "2022-05-10 18:43:00", "2022-06-19 12:43:00")))

list_of_months <- mapply(seq, 
                         trunc(df$start_dt), 
                         trunc(df$end_dt, "month") + months(1), 
                         "1 month")

data.frame(receiver  = rep(df$receiver, sapply(list_of_months, length)),
           monthyear = format(do.call(c, list_of_months), format = "%Y-%m"))

#    receiver monthyear
# 1         1   2021-05
# 2         1   2021-06
# 3         1   2021-07
# 4         1   2021-08
# 5         1   2021-09
# ...