Why is Rstudio data viewer filtering broken by dplyr grouped tables?

1.7k Views Asked by At

Using the data viewer in Rstudio version 0.99, I would like to filter a dplyr grouped table by country names (or another character vector). This breaks the Data viewer. Rstudio says "failure to sort or filter data", the error returned by R is quite cryptic:

Error in vapply(x[[col]], `[`, 0, 1) : values must be type 'double',
 but FUN(X[[1]]) result is type 'character'

Example with the iris sample data

I can reproduce this with the iris sample dataset.

irisgrouped <- iris %>% 
    mutate(Species = as.character(Species)) %>% # Change to a character vector
    group_by(Sepal.Length)

Data viewer filtering by Species breaks with the message "failure to sort or filter data".

Example based on the data I use

Here is also a part of my dataset using dput()

library(dplyr)


dtf <- structure(list(itemcode = c(1632, 1632, 1632, 1632, 1632, 1632
), year = c(1961L, 1961L, 1961L, 1961L, 1961L, 1961L), country = c("Albania", 
                                                                   "Austria", "Bulgaria", "Denmark", "Finland", "France")), .Names = c("itemcode", 
                                                                                                                                       "year", "country"), row.names = c(NA, -6L), class = "data.frame")

The above can be pasted at the R command, there is no issue with filtering in the R studio table viewer. But if I group the data frame again:

dtf2 <- dtf %>% group_by(itemcode) 

Filtering breaks with the message "failure to sort or filter data".

Can you point me to the reason why filter is not working on some character vectors in grouped data frames?

sessionInfo()

In case that is important, here is my sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_IE.UTF-8      LC_NUMERIC=C             
 [3] LC_TIME=en_IE.utf8        LC_COLLATE=en_IE.UTF-8   
 [5] LC_MONETARY=en_IE.utf8    LC_MESSAGES=en_IE.UTF-8  
 [7] LC_PAPER=en_IE.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_IE.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.1

loaded via a namespace (and not attached):
[1] assertthat_0.1  DBI_0.3.1       lazyeval_0.1.10 magrittr_1.5   
[5] parallel_3.1.1  Rcpp_0.11.4     tools_3.1.1    
2

There are 2 best solutions below

1
On

I can confirm I get the same error. I'm running the following using dplyr 0.4.1 on the current RStudio preview (0.99.441) on Windows 8.1.

dtf <- structure(list(itemcode = c(1632, 1632, 1632, 1632, 1632, 1632
), year = c(1961L, 1961L, 1961L, 1961L, 1961L, 1961L), country = c("Albania",
"Austria", "Bulgaria", "Denmark", "Finland", "France")), .Names = c("itemcode", 
"year", "country"), row.names = c(NA, -6L), class = "data.frame")

dtfGrouped <- dtf %>% group_by(itemcode)

View(dtfGrouped)

Clicking on Filter and then typing in a country name results in this failing.

However, View(as.data.frame(dtfGrouped)) and then clicking on Filter works.

2
On

This issue is due to a "bug" in R.
This can be replicated by using the aggregate function to return multiple values:

Try something like: (below is not tested)

newDF <- aggregate(formula = val ~ id1 + id2,data = x,FUN = function(x) c(mn = mean(x), n = length(x) ))

The new "column" will actually have length of 2*nrow(x) if you check the length. It is NOT really a dataframe, but the class is still "data.frame"

Cheers.