Running scheduled DML query on Firebase analytics dataset

253 Views Asked by At

I am trying to run a scheduled DML query from our Firebase analytics dataset. (Backstory, we have a lot of extra events that are no longer needed and for the time being it is easier to drop them from BigQuery).

I thought I might be able to run the query such as the following:


DELETE FROM  `my-project.analytics_1234567890.*`
    WHERE  _TABLE_SUFFIX 
    BETWEEN CONCAT('events_', format_date('%Y%m%d', date_sub(CURRENT_DATE, interval 7 day))) 
        AND CONCAT('events_', format_date('%Y%m%d', CURRENT_DATE))
     AND event_name IN ('not a real event')

But, BigQuery complains with this error: DML over table my-project.analytics_1234567890.* is not supported

Is there a way to query dynamic table names in scheduled DML queries so I can drop specific rows from the last 7 partitions?

1

There are 1 best solutions below

0
On

I wrote an article on how to resolve repetitive queries with scheduled invocations, combine simple queries to complex DML statements by using Cloud Workflows.

Automate the execution of BigQuery queries with Cloud Workflows

essentially it covers your use case:

Workflow Steps for your problem

  1. Run a query to get the table names
  2. Loop for each table to launch a task for it
  3. Execute the parameterized task for each result

workflow.yaml

#workflow entrypoint 
main:
  steps:
    - getList:
        call: BQ_Query
        args:
          query: select distinct _TABLE_SUFFIX as table_id FROM  `my-project.analytics_242990349.*`
                    WHERE  _TABLE_SUFFIX 
                    BETWEEN CONCAT('events_intraday_', format_date('%Y%m%d', date_sub(CURRENT_DATE, interval 7 day))) 
                        AND CONCAT('events_intraday_', format_date('%Y%m%d', CURRENT_DATE))
        result: items
    - loopItems:
        call: BQ_Results_LoopItems
        args:
          items: ${items.rows}
        result: res
    - final:
        return: ${res}
BQ_Query:
    params: [query]
    steps:
      - runBQquery: 
          call: googleapis.bigquery.v2.jobs.query
          args:
              projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
              body:
                  useLegacySql: false
                  query: ${query}
          result: queryResult
      - documentFound:
          return: ${queryResult}
BQ_Results_LoopItems:
  params: [items]
  steps:
    - init:
        assign:
          - i: 0
          - result: ""
    - check_condition:
        switch:
          - condition: ${len(items) > i}
            next: iterate
        next: exit_loop
    - iterate:
        steps:
          - process_item:
              call: BQ_Task
              args:
                table_id: ${items[i].f[0].v}
              result: result
          - assign_loop:
              assign:
                - i: ${i+1}
        next: check_condition
    - exit_loop:
        return: ${result}
BQ_Task:
  params: [table_id]
  steps:
    - delete:
        call: BQ_Query
        args:
          query: ${"DELETE FROM `my-project.analytics_242990349."+table_id+"`
                    WHERE event_name IN ('not a real event')"}
        result: queryResult
    - documentFound:
        return: ${queryResult}