`lazy_dt` not support `dplyr/across`?

58 Views Asked by At

The code show error message as below

This tidyselect interface doesn't support predicates.

It means lazy_dt not support across ? Thanks !

library(tidyverse)
library(dtplyr)
diamonds %>% lazy_dt() %>% group_by(color) %>% 
  summarise(across(where(is.numeric), ~ sum(.)))
2

There are 2 best solutions below

0
stefan On BEST ANSWER

The issue is that dtplyr does not support where(). See this closed issue on GH.

Instead one possible workaround would be to use a vector of the names of numeric columns with all_of:

library(tidyverse)
library(dtplyr)

num_cols <- names(diamonds)[
  sapply(diamonds, is.numeric)
]

diamonds %>%
  lazy_dt() %>%
  group_by(color) %>%
  summarise(across(all_of(num_cols), ~ sum(.)))
#> Source: local data table [7 x 8]
#> Call:   `_DT1`[, .(carat = sum(carat), depth = sum(depth), table = sum(table), 
#>     price = sum(price), x = sum(x), y = sum(y), z = sum(z)), 
#>     keyby = .(color)]
#> 
#>   color carat   depth   table    price      x      y      z
#>   <ord> <dbl>   <dbl>   <dbl>    <int>  <dbl>  <dbl>  <dbl>
#> 1 D     4457. 418005. 388916. 21476439 36701. 36728. 22648.
#> 2 E     6445. 604104. 563241. 30142944 53017. 53090. 32729.
#> 3 F     7028. 588690. 548031. 35542866 53578. 53621. 33058.
#> 4 G     8708. 697361. 646903. 45158240 64111. 64141. 39579.
#> 5 H     7572. 513493. 477628. 37257301 49686. 49698. 30691.
#> 6 I     5568. 335331. 312184  27608146 33740. 33740. 20850.
#> # ℹ 1 more row
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

or use summarise_if (however, be aware that summarise_if is superseded):

diamonds %>%
  lazy_dt() %>%
  group_by(color) %>%
  summarise_if(~ is.numeric(.x), sum)
#> Source: local data table [7 x 8]
#> Call:   `_DT3`[, .(carat = sum(carat), depth = sum(depth), table = sum(table), 
#>     price = sum(price), x = sum(x), y = sum(y), z = sum(z)), 
#>     keyby = .(color)]
#> 
#>   color carat   depth   table    price      x      y      z
#>   <ord> <dbl>   <dbl>   <dbl>    <int>  <dbl>  <dbl>  <dbl>
#> 1 D     4457. 418005. 388916. 21476439 36701. 36728. 22648.
#> 2 E     6445. 604104. 563241. 30142944 53017. 53090. 32729.
#> 3 F     7028. 588690. 548031. 35542866 53578. 53621. 33058.
#> 4 G     8708. 697361. 646903. 45158240 64111. 64141. 39579.
#> 5 H     7572. 513493. 477628. 37257301 49686. 49698. 30691.
#> 6 I     5568. 335331. 312184  27608146 33740. 33740. 20850.
#> # ℹ 1 more row
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
0
r2evans On

If you really need to work on lazy data, "need" the ability to operate only on numeric columns, and cannot load the whole data, you can load one row, get the where(.)-like metadata you'll need, and use that in all_of as stefan suggested.

data("diamonds", package="ggplot2")
library(dtplyr)
lazydia <- lazy_dt(diamonds)
sapply(lazydia, is.numeric) # does not work while lazy
#        parent          vars        groups        locals implicit_copy    needs_copy           env          name 
#         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE 
num_cols <- lazydia |>
  head(1) |>
  collect() |>
  sapply(is.numeric)
num_cols <- names(num_cols)[num_cols]
num_cols
# [1] "carat" "depth" "table" "price" "x"     "y"     "z"    

With this, you can continue with @stefan's use of across(all_of(num_cols), ..):

lazydia |>
  group_by(color) |>
  summarise(across(all_of(num_cols), ~ sum(.))) |>
  collect()
# # A tibble: 7 × 8
#   color carat   depth   table    price      x      y      z
#   <ord> <dbl>   <dbl>   <dbl>    <int>  <dbl>  <dbl>  <dbl>
# 1 D     4457. 418005. 388916. 21476439 36701. 36728. 22648.
# 2 E     6445. 604104. 563241. 30142944 53017. 53090. 32729.
# 3 F     7028. 588690. 548031. 35542866 53578. 53621. 33058.
# 4 G     8708. 697361. 646903. 45158240 64111. 64141. 39579.
# 5 H     7572. 513493. 477628. 37257301 49686. 49698. 30691.
# 6 I     5568. 335331. 312184  27608146 33740. 33740. 20850.
# 7 J     3263. 173779. 162337. 14949281 18306. 18303. 11325.