access and filter explicit NA values created by parse_factor()

58 Views Asked by At

readr::parse_factor() is the tidyverse way to create factor variables. By default, it sets include_na = TRUE: an explicit NA level will be created for NA values in the vector that is passed to parse_factor(). After a factor variable has been created with parse_factor(), how can one access or filter these "explicit NA" values?

This code illustrates the issue:

library(readr)
xFac <- parse_factor(c("a", "b", NA))
levels(xFac)        # NA is a level of xFac
is.na(xFac)         # FALSE FALSE FALSE
xFac == "NA"        # FALSE FALSE FALSE
xFac[!is.na(xFac)]  # a    b    <NA>

In the last line, I try to get only those values of xFac that aren't NA. But the line doesn't work; the NA value is returned along with the others. What is the right way to write this line (while keeping the explicit NA values in xFac)?

A number of SO posts ask how to filter ordinary NA values. Those posts don't seem relevant here: my question is about the "explicit NA" values that are created by parse_factor(), and by design, they don't behave in the same way as ordinary NA values.

3

There are 3 best solutions below

1
On BEST ANSWER

You can convert it into a character vector and then use is.na().

> xFac
[1] a    b    <NA>
Levels: a b <NA>

> xFac[!is.na(as.character(xFac))]
[1] a b
Levels: a b <NA>

Or you can use %in%

> xFac[!xFac %in% NA]
[1] a b
Levels: a b <NA>
1
On

Maybe this can help:

#Code 1
xFac <- parse_factor(c("a", "b", NA),include_na = F,na=c('NA'))
#Code 2
xFac[!is.na(xFac)]

Output:

xFac[!is.na(xFac)]
[1] a b
Levels: a b

Also:

xFac
[1] a    b    <NA>
Levels: a b
0
On

A factor is actually an integer vector whose value indicates which of the levels it corresponds to.

So, if you look at the levels:

levels(xFac)
#> [1] "a" "b" NA 
is.na(levels(xFac))
#> [1] FALSE FALSE  TRUE

The level is actually NA. So, you just need to find the elements of xFac whose levels are NA, i.e. whose integer value is 3.

as.integer(xFac) == which(is.na(levels(xFac)))
#> [1] FALSE FALSE  TRUE

And you can put that in a function:

is_na_factor <- function(x){
  as.integer(x) == which(is.na(levels(x)))
}
xFac[! is_na_factor(xFac)]
#> [1] a b
#> Levels: a b <NA>