I have a data and time vector that I want to round down to the nearest 20-minute period. It works fine when I use the floor_date from the package lubridate on parts of the vector, but when I apply it on the whole vector, I only get the dates (without the time) back. Can anyone explain why this happens, and advice how to fix it? This is an example:
Just applying it on the first part of the vector returns what I expect:
> floor_date(as.POSIXct(test[1:20]), unit = "20 mins")
[1] "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET"
[5] "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET"
[9] "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET"
[13] "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET"
[17] "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET" "2003-02-04 04:40:00 CET"
Performing it on the whole vector gives just the date:
> head(floor_date(as.POSIXct(test), unit = "20 mins"), 20)
[1] "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET"
[7] "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET"
[13] "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET" "2003-02-04 CET"
[19] "2003-02-04 CET" "2003-02-04 CET"
It is a rather long vector (n = 8827411), but I would not think that should really matter?
When I try to investigate this, it seems that for some reason the as.POSIXct returns just a date object for some values, but not for most of the values. Why is this? E.g. When I look at these two values, that really look identical to me:
> test[8717005]
[1] "2020-03-29 02:58:32.200"
> test[8718005]
[1] "2020-03-29 04:55:01.400"
One of them is returned as date and time - the other only as date. Why is that?
> as.POSIXct(test[8717005])
[1] "2020-03-29 CET"
> as.POSIXct(test[8718005])
[1] "2020-03-29 04:55:01 CEST"
or
> as.POSIXct("2020-03-29 02:58:32.200")
[1] "2020-03-29 CET"
> as.POSIXct("2020-03-29 04:55:01.400")
[1] "2020-03-29 04:55:01 CEST"
So, I guess the problem is not with the floor_date, but with the as.POSIXct. But I don't understand why it does not return the same on two values that seems exactly the same. Fooling around with this a bit, it seems it does not like 2 on this particular date?? But the date before and after works fine. And any other hours than 2 works fine also for this date.
> as.POSIXct("2020-03-29 01:00:00")
[1] "2020-03-29 01:00:00 CET"
> as.POSIXct("2020-03-29 02:00:00")
[1] "2020-03-29 CET"
> as.POSIXct("2020-03-29 03:00:00")
[1] "2020-03-29 03:00:00 CEST"
> as.POSIXct("2020-03-29 04:00:00")
[1] "2020-03-29 04:00:00 CEST"
> as.POSIXct("2020-03-29 05:00:00")
[1] "2020-03-29 05:00:00 CEST"
> as.POSIXct("2020-03-29 06:00:00")
[1] "2020-03-29 06:00:00 CEST"
> as.POSIXct("2020-03-29 07:00:00")
[1] "2020-03-29 07:00:00 CEST"
> as.POSIXct("2020-03-29 08:00:00")
[1] "2020-03-29 08:00:00 CEST"
> as.POSIXct("2020-03-29 09:00:00")
[1] "2020-03-29 09:00:00 CEST"
> as.POSIXct("2020-03-29 10:00:00")
[1] "2020-03-29 10:00:00 CEST"
> as.POSIXct("2020-03-29 11:00:00")
[1] "2020-03-29 11:00:00 CEST"
> as.POSIXct("2020-03-29 12:00:00")
[1] "2020-03-29 12:00:00 CEST"
> as.POSIXct("2020-03-29 02:00:00")
[1] "2020-03-29 CET"
> as.POSIXct("2020-03-30 02:00:00")
[1] "2020-03-30 02:00:00 CEST"
> as.POSIXct("2020-03-28 02:00:00")
[1] "2020-03-28 02:00:00 CET"
Can anyone understand what is special about this particular hour on this particular date?? Why do I get this, and how can I fix it?? (This was just one example I found in the rather long vector - there were a couple of other instances where the as.POSIXct returned only the date, but in all cases, the format was exactly the same as the cases where both time and date was returned)