Converting date+time in Character format to CET while correctly taking into account daylight savings

61 Views Asked by At

I have a small problem converting some characters (the format is CET, they are just stored as characters). I want to change the format from characters to date, such that I can convert between timezones. Here is a small part of the data that I'm looking at.

timestamps <- c("2023-10-29 00:00:00", "2023-10-29 00:15:00", "2023-10-29 00:30:00", "2023-10-29 00:45:00", "2023-10-29 01:00:00", "2023-10-29 01:15:00", "2023-10-29 01:30:00", "2023-10-29 01:45:00", "2023-10-29 02:00:00", "2023-10-29 02:15:00", "2023-10-29 02:30:00", "2023-10-29 02:45:00", "2023-10-29 03:00:00", "2023-10-29 02:15:00", "2023-10-29 02:30:00", "2023-10-29 02:45:00", "2023-10-29 03:00:00", "2023-10-29 03:15:00", "2023-10-29 03:30:00", "2023-10-29 03:45:00", "2023-10-29 04:00:00", "2023-10-29 04:15:00")

My first attempt was doing the following:

as.POSIXct(timestamps, tz = "CET")

but that gives the following output:

 [1] "2023-10-29 00:00:00 CEST" "2023-10-29 00:15:00 CEST" "2023-10-29 00:30:00 CEST" "2023-10-29 00:45:00 CEST" "2023-10-29 01:00:00 CEST"
 [6] "2023-10-29 01:15:00 CEST" "2023-10-29 01:30:00 CEST" "2023-10-29 01:45:00 CEST" "2023-10-29 02:00:00 CEST" "2023-10-29 02:15:00 CET" 
[11] "2023-10-29 02:30:00 CET"  "2023-10-29 02:45:00 CET"  "2023-10-29 03:00:00 CET"  "2023-10-29 02:15:00 CET"  "2023-10-29 02:30:00 CET" 
[16] "2023-10-29 02:45:00 CET"  "2023-10-29 03:00:00 CET"  "2023-10-29 03:15:00 CET"  "2023-10-29 03:30:00 CET"  "2023-10-29 03:45:00 CET" 
[21] "2023-10-29 04:00:00 CET"  "2023-10-29 04:15:00 CET" 

Here the CEST timezone is not properly allocated. They should be allocated up to the "2023-10-29 03:00:00" timestamp.

I also tried to use the "ymd_hms" from lubridate, but that did not do the trick either:

[1] "2023-10-29 00:00:00 CEST" "2023-10-29 00:15:00 CEST" "2023-10-29 00:30:00 CEST" "2023-10-29 00:45:00 CEST" "2023-10-29 01:00:00 CEST"
 [6] "2023-10-29 01:15:00 CEST" "2023-10-29 01:30:00 CEST" "2023-10-29 01:45:00 CEST" "2023-10-29 02:00:00 CET"  "2023-10-29 02:15:00 CET" 
[11] "2023-10-29 02:30:00 CET"  "2023-10-29 02:45:00 CET"  "2023-10-29 03:00:00 CET"  "2023-10-29 02:15:00 CET"  "2023-10-29 02:30:00 CET" 
[16] "2023-10-29 02:45:00 CET"  "2023-10-29 03:00:00 CET"  "2023-10-29 03:15:00 CET"  "2023-10-29 03:30:00 CET"  "2023-10-29 03:45:00 CET" 
[21] "2023-10-29 04:00:00 CET"  "2023-10-29 04:15:00 CET" 

Actually, 1 less time period is now being assigned the "CEST" timezone compared to before. Does anyone have a solution for this problem?

1

There are 1 best solutions below

0
Lennyy On

If you assume your data is sorted, you may add +0100 for the (duplicated) CET times and +0200 for the CEST times, and then add %z in the as.POSIXct call.

ifelse(duplicated(timestamps) | as.POSIXct(timestamps, tz = "CET") > as.POSIXct("2023-10-29 03:00:00"),
       paste0(timestamps, " +0100"),
       paste0(timestamps, " +0200")
       ) |> 
  as.POSIXct(format="%Y-%m-%d %H:%M:%S %z")

Gives:

[1] "2023-10-29 00:00:00 CEST" "2023-10-29 00:15:00 CEST" "2023-10-29 00:30:00 CEST" "2023-10-29 00:45:00 CEST" "2023-10-29 01:00:00 CEST" "2023-10-29 01:15:00 CEST"
 [7] "2023-10-29 01:30:00 CEST" "2023-10-29 01:45:00 CEST" "2023-10-29 02:00:00 CEST" "2023-10-29 02:15:00 CEST" "2023-10-29 02:30:00 CEST" "2023-10-29 02:45:00 CEST"
[13] "2023-10-29 02:00:00 CET"  "2023-10-29 02:15:00 CET"  "2023-10-29 02:30:00 CET"  "2023-10-29 02:45:00 CET"  "2023-10-29 03:00:00 CET"  "2023-10-29 03:15:00 CET" 
[19] "2023-10-29 03:30:00 CET"  "2023-10-29 03:45:00 CET"  "2023-10-29 04:00:00 CET"  "2023-10-29 04:15:00 CET"