Operating across columns (rowwise) in R dbplyr

546 Views Asked by At

I opened an issue for this. But in case this is operator error and not a problem with the package, I'm posting here for coding advice.

The most recent (v2.1) update to dbplyr was supposed to increase support for across and associated functions. However, I'm still getting errors when I try to use it.

Should I be using different syntax? Is there a workaround?

library(dplyr, warn.conflicts = FALSE)
library(bigrquery)

set.seed(02042021)

Sys.setenv(BIGQUERY_TEST_PROJECT = "elite-magpie-257717")
bq_deauth()
bq_auth(email="[email protected]")

conn = DBI::dbConnect(
  bigrquery::bigquery(),
  project = bq_test_project(),
  dataset = "test_dataset"
)

my_table = data.frame(
  A = replicate(10, paste(sample(letters[1:3], 3, replace=T), collapse="")),
  B = replicate(10, paste(sample(letters[1:3], 3, replace=T), collapse="")),
  C = replicate(10, paste(sample(letters[1:3], 3, replace=T), collapse="")),
  D = runif(10)
)
my_table
#>      A   B   C         D
#> 1  bcb cbb bbb 0.3620390
#> 2  aac aac bba 0.5505868
#> 3  aca abb bcb 0.4028455
#> 4  bca baa bbb 0.3247477
#> 5  bcc cac ccc 0.6861223
#> 6  cac bba baa 0.6970764
#> 7  bcb bbc acc 0.6873332
#> 8  bca acb acb 0.5391651
#> 9  cba ccc abc 0.9442450
#> 10 cca cbc bcc 0.6319561

my_table %>%
  mutate(
    has_ab = if_any(everything(), ~grepl("ab", .))
  )
#>      A   B   C         D has_ab
#> 1  bcb cbb bbb 0.3620390  FALSE
#> 2  aac aac bba 0.5505868  FALSE
#> 3  aca abb bcb 0.4028455   TRUE
#> 4  bca baa bbb 0.3247477  FALSE
#> 5  bcc cac ccc 0.6861223  FALSE
#> 6  cac bba baa 0.6970764  FALSE
#> 7  bcb bbc acc 0.6873332  FALSE
#> 8  bca acb acb 0.5391651  FALSE
#> 9  cba ccc abc 0.9442450   TRUE
#> 10 cca cbc bcc 0.6319561  FALSE

my_table %>%
  mutate(
    has_ab = if_any(where(is.numeric), ~grepl("ab", .))
  )
#>      A   B   C         D has_ab
#> 1  bcb cbb bbb 0.3620390  FALSE
#> 2  aac aac bba 0.5505868  FALSE
#> 3  aca abb bcb 0.4028455  FALSE
#> 4  bca baa bbb 0.3247477  FALSE
#> 5  bcc cac ccc 0.6861223  FALSE
#> 6  cac bba baa 0.6970764  FALSE
#> 7  bcb bbc acc 0.6873332  FALSE
#> 8  bca acb acb 0.5391651  FALSE
#> 9  cba ccc abc 0.9442450  FALSE
#> 10 cca cbc bcc 0.6319561  FALSE

dbRemoveTable(
  conn=conn,
  name="test_dataset.mytable",
  value=my_table,
  overwrite=T
)

dbWriteTable(
  conn=conn,
  name="test_dataset.mytable",
  value=my_table,
  overwrite=T
)

my_table_bq = tbl(conn, "mytable")


my_table_bq %>%
  mutate(
    has_ab = if_any(everything(), ~grepl("ab", .))
  )
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "formula"

my_table_bq %>%
  mutate(
    has_ab = if_any(where(is.numeric), ~grepl("ab", .))
  )
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "function"

Created on 2021-02-05 by the reprex package (v1.0.0)

2

There are 2 best solutions below

2
On

I do not think this is possible with dbplyr (version 2.1.0) at present. Here are my test cases:

(1) working case

# shared setup
library(dplyr)
library(dbplyr)
data(iris)
df = tbl_lazy(iris, con = simulate_mssql()) %>%
  select(Sepal.Length)

df %>%
  mutate(new = Sepal.Length + 1) %>%
  show_query()

Returns the expected SQL:

<SQL>
SELECT `Sepal.Length`, `Sepal.Length` + 1.0 AS `new`
FROM `df`

(2) introducing a simple everything

df %>%
  mutate(new = if_any(everything(), TRUE)) %>%
  show_query()

Returns invalid sql because no translation for if_any or everything exists:

<SQL>
SELECT `Sepal.Length`, if_any(everything(), 1) AS `new`
FROM `df`

(3) simple is.numeric

df %>%
  mutate(new = if_any(where(is.numeric), TRUE)) %>%
  show_query()

Errors because is.numeric is passed as a function

Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "function"

(4) wrapping TRUE in a function that returns TRUE

df %>%
  mutate(new = if_any(everything(), ~{TRUE})) %>%
  show_query()

Errors because no translation for the implicit function ~{...} exists:

Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "formula"

0
On

Here is a potential work around based on this answer about dynamic case_when.

list_of_columns = colnames(df)

text_to_match = "ab"

cases = paste0("`", list_of_columns, "` %LIKE% '%", text_to_match, "%' ~ 1")
cases = c(cases, "1 == 1 ~ 0")

output = df %>%
    mutate(new_col = case_when(
      !!!rlang::parse_exprs(cases)
    ))

For another example of this technique, or as part of a reusable function, take a look at the collapse_indicator_columns function of my dbplyr_helpers repo.