Querying a table from a parameter in a BigQuery UDF

971 Views Asked by At

I am trying to create a UDF that will find the maximum value of a field called 'DatePartition' for each table that is passed through to the UDF as a parameter. The UDF I have created looks like this:

CREATE TEMP FUNCTION maxDatePartition(x STRING) AS ((
  SELECT MAX(DatePartition) FROM x WHERE DatePartition >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(),INTERVAL 7 DAY)
));

but I am getting the following error: "Table name "x" missing dataset while no default dataset is set in the request."

The table names will get passed to the UDF in the format:

my-project.my-dataset.my-table

EDIT: Adding more context: I have multiple tables that are meant to update every morning with yesterday's data. Sometimes the tables are updated later than expected so I am creating a view which will allow users to quickly see the most recent data in each table. To do this I need to calculate MAX(DatePartition) for all of these tables in one statement. The list of tables will be stored in another table but it will change from time to time so I can't hardcode them in.

1

There are 1 best solutions below

3
On

I have tried to do it in a single statement, but have found I need to invoke a common table expression as a sorting mechanism. I haven't found success using the MAX() function on TIMESTAMPs. Here is a method that has worked the best for me that I've discovered (and most concise). No UDF needed. Try something like this:

WITH
  DATA AS (
SELECT
  ROW_NUMBER() OVER (PARTITION BY your_group_by_fields ORDER BY DatePartition DESC) AS _row,
  *
FROM
  `my-project.my-dataset.my-table` 
WHERE
  Date_Partition >= TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 7 DAY)
)

SELECT
* EXCEPT(_row)
FROM
  DATA
WHERE
  _row = 1;

What this does is creates a new field with a row number for each partition of whatever grouped field that has muliple records of different timestamps. So for each of the records of a certain group, it will order them by most recent DatePartition and give them a row number value with "1" being the most recent since we sorted the DatePartition DESC.

Then it takes your common table expression of sorted values, and just returns everything in your table (EXCEPT that row number "_row" you assigned) and then filter only on "_row =1" which will be your most recent records.