GCP Data Catalog Schema History or Versioning

365 Views Asked by At

I've been wondering if it is possible to have versions of schema in GCP Data Catalog Service? Or maybe advice on how you deal with Data Catalog entries when schema is changed (e.g. in CloudSQL, GCS fileset, BigQuery) and how history could be handled if it is not supported by Google?

Tried to investigate Data Catalog API calls and Logging after entry is updated, however, there were no changes, no history.

I've found that functionality in AWS (https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html). There is also question in GCP Community that is unanswered: https://www.googlecloudcommunity.com/gc/Data-Analytics/How-can-I-see-entry-history/m-p/425135#M338 Custom tools, such as Liquibase (https://medium.com/google-cloud/version-control-of-bigquery-schema-changes-with-liquibase-ddc7092d6d1d), are not suitable in this case, as they are limited for BigQuery (not all GCP services).

I expect ANY versioning of Data Catalog Entries (schemas in particular), history in logs or such.

1

There are 1 best solutions below

0
On

Unfortunately currently there is no versioning in Data Catalog. If entry schema in ingested systems is changed the tag attached to removed column is lost. For simple versioning use cases you may consider using terraform with storing configuration in Cloud Source Repositories.