readr::parse_factor()
is the tidyverse way to create factor variables. By default, it sets include_na = TRUE
: an explicit NA level will be created for NA values in the vector that is passed to parse_factor()
. After a factor variable has been created with parse_factor()
, how can one access or filter these "explicit NA" values?
This code illustrates the issue:
library(readr)
xFac <- parse_factor(c("a", "b", NA))
levels(xFac) # NA is a level of xFac
is.na(xFac) # FALSE FALSE FALSE
xFac == "NA" # FALSE FALSE FALSE
xFac[!is.na(xFac)] # a b <NA>
In the last line, I try to get only those values of xFac
that aren't NA. But the line doesn't work; the NA value is returned along with the others. What is the right way to write this line (while keeping the explicit NA values in xFac
)?
A number of SO posts ask how to filter ordinary NA values. Those posts don't seem relevant here: my question is about the "explicit NA" values that are created by parse_factor()
, and by design, they don't behave in the same way as ordinary NA values.
You can convert it into a character vector and then use
is.na()
.Or you can use
%in%