How to do - wildcard search in Data Catalog (Google Cloud Platform)

551 Views Asked by At

How to execute a wildcard/RegEx search in Data Catalog (Google Cloud Platform) ?

  • It would make sense to search metadata across column names and tag attributes (and there values).

The current documentation only lists very strict search behavior e.g. for tag:data_gov_template.hasPII(=true)

  • Needed would be a result for "PII" - I don't care about specifying the exact template name etc.

e.g. labels:etl

  • if I only search for etl there is no result

(metadata/attributes and values is not searchable on a direct way?)

1

There are 1 best solutions below

1
On

From your use case, I understood that you want to search for a particular metadata attribute, like a Tag field, PII, right?

For tagged assets

If you don't care about the template name. You could use the tag:x search facet.

So if all your templates, data_gov_template, data_curator_template, data_etl_template, all contain the same Tag field name, has_pii, you can search using:

tag:has_pii and this will return all assets with that metadata attribute, no matter what the template name is.

For columns

You can use the column:x search facet to match a substring of the column name in the schema of the data asset. Which does not support nested columns yet.

For labels

You can use the labels:bar search facet for data assets that have a label (with some value) and the label key has bar as a substring.

You are also able to search on their values. So yes, the metadata/attributes and values are searchable.

But it is not a regex kind, it is a substring match when the search facet uses colon :, like labels:bar or an exact match when the search facet uses equals =, like type=table.