I have a BigQuery scheduled query script.
I used current_timestamp() in my script, when I processed 2 days data, BigQuery told me that "the script will process 50 GB data".
I changed my script, and processed 2 months data, BigQuery told me that "the script will process 1.5 TB data".
All seemed to be correct.
One problem, with current_timestamp(), my scheduled query cannot run backfill (each time it runs, it will use current_timestamp), so I should use @run_time instead, right?
I changed the current_timestamp() to @run_time in my script, it seemed to be a correct and simple change.
But now, no matter I changed my script to process 2 days data, 2 weeks data or 2 months data. BigQuery ALWAYS told me:
"The script will process 9.29 TB data.".
Now it seemed BigQuery ALWAYS scanned the whole table now with @run_time in my script. Instead of scanning only partitions (2 day, 2 week or 2 months) I asked for.
Does it mean when I use @run_time in my scheduled query script, BigQuery will fail to predict the size it will process? BigQuery is not really going to process 9.29 TB data each time?
Or worse, when @run_time is used, BigQuery fails to recognize the partitions in my script and scan the full data set. If this is the case, then people will WORRY about using @run_time in scheduled query script, because using @run_time would cost huge money for big dataset processing.
I am trying to use BigQuery public dataset go get example of current_timestamp() vs @run_time in sql script.
-- Declare variables based on the current timestamp
DECLARE last_1_day TIMESTAMP DEFAULT
TIMESTAMP(DATE_SUB(DATE_TRUNC(DATE(current_timestamp()), day), INTERVAL 1 day));
-- Query to select data from the last three months
SELECT *
FROM `bigquery-public-data.blockchain_analytics_ethereum_mainnet_us.blocks`
WHERE TIMESTAMP_TRUNC(block_timestamp, day) >= last_1_day;
This script will process 81.11 MB when run.
-- Declare variables based on the current timestamp
DECLARE last_1_day TIMESTAMP DEFAULT TIMESTAMP(DATE_SUB(DATE_TRUNC(DATE(@run_time), day), INTERVAL 1 day));
-- Query to select data from the last three months
SELECT *
FROM `bigquery-public-data.blockchain_analytics_ethereum_mainnet_us.blocks`
WHERE TIMESTAMP_TRUNC(block_timestamp, day) >= last_1_day;
WARNING: Could not compute bytes processed estimate for script.
But what I get is different than querying on my own dataset. Querying on my own dataset, when I use @run_time, BigQuery gave huge estimate of the bytes to be processed as I listed above.