Bigquery - Version Control Scheduled Queries

896 Views Asked by At

Right now I have scheduled queries via BQ interface. They work but do not scale or migrate very well (across dev and prod gcp projects). So I am trying to do scheduled queries in a way that is reproducible, scalable and migrate-able.

My queries are complicated and hence I am struggling with ', " and ''' to make it run via bq commands and schedule via github actions.

This is the query which is most complicated:

declare bq_last_id int64;
declare external_sql string;
set bq_last_id = (select max(id) from bq_dataset.bq_table);
set external_sql = '"select * from mysql_table where id > ('|| bq_last_id ||')"';
execute immediate 'select * from external_query("my-gcp-project.my-region.my-connection-name",'|| external_sql || ');'

In total there are 20 something queries that have to scheduled. This one is the only one which is incremental or other are drop and recreate table again so they are not as complicated as this one.

WHAT I HAVE TRIED TILL NOW:

  • creating an on-demand query in BQ interface and then using bq mk command to run it with timestamp as a variable as shown in this answer. The problem with it is I still have to manually create the on-demand queries and I will have to do separately in dev and prod projects.
  • I am unable to find a way to create on-demand query using bq cmd.
  • I am unable to bq query to run the queries (that is not create bq scheduled queries at all). and then later schedule them via gihub action.

Any help with correct syntax or better suggestions to do this will be super helpful to me.

Thanks.

1

There are 1 best solutions below

0
ewertonvsilva On

The way I use to solve problems with scape characters when inserting queries in bq commands is using jq on shell and my queries on a file, as following:

  1. create queries.sql file with your query script:
cat queries.sql
declare bq_last_id int64;
declare external_sql string;
set bq_last_id = (select max(id) from bq_dataset.bq_table);
set external_sql = '"select * from mysql_table where id > ('|| bq_last_id ||')"';
execute immediate 'select * from external_query("my-gcp-project.my-region.my-connection-name",'|| external_sql || ');'
  1. Create the following script: schedule_query.sh
#!/bin/bash
set -f #avoind * used as wildcard
json=$(jq -nc --arg query "$(<queries.sql)" '{ "query": $query }')

#adapt the command and param to work in your environment (destination, tables, etc...)
bq mk \
--transfer_config \
--target_dataset=mydataset \
--display_name='My Scheduled Query' \
--params="$json" \
--data_source=scheduled_query \
--service_account_name=abcdef-test-sa@abcdef-test.iam.gserviceaccount.com