drop_na( ) cannot work on POSIX-lt object

922 Views Asked by At

According to the title, I make a simple example to test drop_na {tidyr} :

library(tidyr)
library(dplyr)

# (1.) produce a dataset with two POSIX type "ct" and "lt"

data <- data.frame(n = 1:5)
data$ct <- as.POSIXct(Sys.time() + rnorm(5) * 1000)
data$lt <- as.POSIXlt(Sys.time() + rnorm(5) * 1000)
str(data)

# $ n : int  1 2 3 4 5
# $ ct: POSIXct, format: "2018-10-07 03:02:28" ...
# $ lt: POSIXlt, format: "2018-10-07 02:37:26" ...


# (2.) assign the third values of "ct" and "lt" to NA

data[3, c("ct", "lt")] <- NA


# (3.) use different function to remove rows with NA

data %>% is.na()               # identify NAs in both "ct" and "lt"
data %>% drop_na('ct')         # drop NA from "ct"
data %>% drop_na('lt')         # NOT drop NA from "lt"
data[c(1, 2)] %>% na.omit()    # drop NA from "ct"
data[c(1, 3)] %>% na.omit()    # NOT drop NA from "lt"

From the conclusion above, if there are NAs in the POSIX-lt variables, only is.na() can be used to drop rows with NAs.

I approximately know the difference between POSIX "ct" and "lt".

  • POSIXct represents the number of seconds since the beginning of 1970 as a numeric vector.
  • POSIXlt is a named list of vectors representing.

So someone can explain why POSIXlt's missing values cannot be identified by drop_na() and na.omit() ?

1

There are 1 best solutions below

2
On BEST ANSWER

Short answer: use POSIXct unless you really need POSIXlt

Longer answer:

POSIXlt is a difficult and capricious data structure. See:

> str(c(as.POSIXlt(Sys.time()), NA))
 POSIXlt[1:2], format: "2018-10-07 00:43:06" NA
> unclass(c(as.POSIXlt(Sys.time()), NA))
$sec
[1] 15.78872       NA

$min
[1] 43 NA

$hour
[1]  0 NA
# skipped a few rows

$isdst
[1]  1 -1

$zone
[1] "EEST" ""   
# skipped a few rows 

In short, POSIXlt is a list of vectors, each vector representing one of the date/time units: seconds, minutes, hours, days, etc., but also time zone etc. There is no method for na.omit for POSIXlt, so na.omit.default is used, which does not know the specifics of POSIXlt class and treats it as an ordinary list.

> na.omit(list(NA,NA,NA))
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

If you need a na.omit method for POSIXlt, you can write one. But if not really, it is easier to use POSIXct.

A corollary: na.omit doesn't really work with lists either (i.e., it can be used but does nothing). You can sapply or lapply na.omit to the lists but that will produce strange results as well (NA components will be replaced by logical(0)). It looks like na.omit is meant for use with atomic vectors or factors, as well as data frames. (The help page says, it's mostly useful with data frames). Which means that na.omit is not intended to be useful with lists, including POSIXlt.

Finally, why would one use POSIXlt at all? The idea (as far as i understand it) is that you can easily manipulate the date's components - but even that can produce unexpected results:

> foo <- as.POSIXlt(Sys.time())
> foo
[1] "2018-10-07 01:06:22 EEST"
> foo$year
[1] 118
> foo$mon
[1] 9
> foo$mon <- 10
> foo
[1] "2018-11-07 01:06:22 EEST"
> foo$year <- 2018
> foo
[1] "3918-11-07 01:06:22 EEST"

So if you need to manipulate a date's components separately, you will have less surprises with lubridate.

> library(lubridate)
> year(foo)
[1] 3918
> year(foo) <- 2018
> foo
[1] "2018-11-07 01:06:22 EET"
> month(foo)
[1] 11
> month(foo)<-10
> foo
[1] "2018-10-07 01:06:22 EEST"