Return quantiles within a summarise() call through dbplyr/bigRquery to BigQuery SQL database

278 Views Asked by At

I am attempting to get quantiles for a variable in a grouped BigQuery table, and I get this error:

Error: Job 'xxxxx' failed
Syntax error: Expected end of input but got keyword WITHIN at [1:45] [invalidQuery]

Reprex is below.

# NOTE: for reprex to work, you must have BIGQUERY_TEST_PROJECT envvar set to name of project which has billing set up and to which you have write access

library(DBI)
library(bigrquery)
library(dplyr)

billing <- bq_test_project()

con <- dbConnect(
  bigrquery::bigquery(),
  project = "publicdata",
  dataset = "samples",
  billing = billing
)

natality <- tbl(con, "natality")
   
natality %>%
  group_by(year) %>%
  summarize(q25 = quantile(weight_pounds,0.25),
            q50 = median(weight_pounds),
            q75 = quantile(weight_pounds,0.75)
  )

Anyone know a workaround, perhaps by providing SQL code through sql() in the summarise() call?

Thanks!

1

There are 1 best solutions below

0
On

a colleague discovered an answer by providing SQL code using sql() in the summarize() call:

# NOTE: for reprex to work, you must have BIGQUERY_TEST_PROJECT envvar set to name of project which has billing set up and to which you have write access

library(DBI)
library(bigrquery)
library(dplyr)

billing <- bq_test_project()

con <- dbConnect(
  bigrquery::bigquery(),
  project = "publicdata",
  dataset = "samples",
  billing = billing
)

natality <- tbl(con, "natality")
   
natality %>%
  group_by(year) %>%
  summarize(q25 = sql("approx_quantiles(weight_pounds,4)[offset(1)]"),
            q50 = sql("approx_quantiles(weight_pounds,2)[offset(1)]"),
            q75 = sql("approx_quantiles(weight_pounds,4)[offset(3)]")
  )