Is it possible to set expiration time for records in BigQuery

2k Views Asked by At

Is it possible to set a time to live for a column in BigQuery? If there are two records in table payment_details and timestamp, the data in BigQuery table should be deleted automatically if the timestamp is current time - timestamp is greater is 90 days.

1

There are 1 best solutions below

0
On BEST ANSWER

Solution 1:

BigQuery has a partition expiration feature. You can leverage that for your use case.

Essentially you need to create a partitioned table, and set the partition_expiration_days option to 90 days.

CREATE TABLE
  mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY
  transaction_date
OPTIONS(
  partition_expiration_days=90
)

or if you have a table partitioned already by the right column

ALTER TABLE mydataset.mytable
 SET OPTIONS (
   -- Sets partition expiration to 90 days
   partition_expiration_days=90
 )

When a partition expires, BigQuery deletes the data in that partition.

Solution 2:

You can setup a Scheduled Query that will prune hourly/daily your data that is older than 90 days. By writing a "Delete" query you have more control to actually combine other business logic, like only delete duplicate rows, but keep most recent entry even if it's older than 90 days.

Solution 3:

If you have larger business process that does the 90 day pruning based on other external factors, like an API response, and conditional evaluation, you can leverage Cloud Workflows to build and invoke a workflow regularly to automate the pruning of your data. See Automate the execution of BigQuery queries with Cloud Workflows article which can guide you with this.