According to the title, I make a simple example to test drop_na {tidyr}
:
library(tidyr)
library(dplyr)
# (1.) produce a dataset with two POSIX type "ct" and "lt"
data <- data.frame(n = 1:5)
data$ct <- as.POSIXct(Sys.time() + rnorm(5) * 1000)
data$lt <- as.POSIXlt(Sys.time() + rnorm(5) * 1000)
str(data)
# $ n : int 1 2 3 4 5
# $ ct: POSIXct, format: "2018-10-07 03:02:28" ...
# $ lt: POSIXlt, format: "2018-10-07 02:37:26" ...
# (2.) assign the third values of "ct" and "lt" to NA
data[3, c("ct", "lt")] <- NA
# (3.) use different function to remove rows with NA
data %>% is.na() # identify NAs in both "ct" and "lt"
data %>% drop_na('ct') # drop NA from "ct"
data %>% drop_na('lt') # NOT drop NA from "lt"
data[c(1, 2)] %>% na.omit() # drop NA from "ct"
data[c(1, 3)] %>% na.omit() # NOT drop NA from "lt"
From the conclusion above, if there are NAs in the POSIX-lt variables, only is.na()
can be used to drop rows with NAs.
I approximately know the difference between POSIX "ct" and "lt".
POSIXct
represents the number of seconds since the beginning of 1970 as a numeric vector.POSIXlt
is a named list of vectors representing.
So someone can explain why POSIXlt
's missing values cannot be identified by drop_na()
and na.omit()
?
Short answer: use POSIXct unless you really need POSIXlt
Longer answer:
POSIXlt is a difficult and capricious data structure. See:
In short, POSIXlt is a list of vectors, each vector representing one of the date/time units: seconds, minutes, hours, days, etc., but also time zone etc. There is no method for
na.omit
for POSIXlt, sona.omit.default
is used, which does not know the specifics ofPOSIXlt
class and treats it as an ordinary list.If you need a
na.omit
method forPOSIXlt
, you can write one. But if not really, it is easier to usePOSIXct
.A corollary:
na.omit
doesn't really work with lists either (i.e., it can be used but does nothing). You cansapply
orlapply
na.omit to the lists but that will produce strange results as well (NA
components will be replaced bylogical(0)
). It looks likena.omit
is meant for use with atomic vectors or factors, as well as data frames. (The help page says, it's mostly useful with data frames). Which means thatna.omit
is not intended to be useful with lists, includingPOSIXlt
.Finally, why would one use POSIXlt at all? The idea (as far as i understand it) is that you can easily manipulate the date's components - but even that can produce unexpected results:
So if you need to manipulate a date's components separately, you will have less surprises with lubridate.