I am looking for solution to perform data profiling on bigquery table covering below statistics for each column in table. Some of the columns are ARRAY and STRUCT as given below.
I tried multiple ways to generate dynamic query to cover below scenarios but no luck.. I will greatly appreciate your help/inputs.
Metrics I want to calculate part of this solution are:
- MIN VALUE
- MAX VALUE
- MIN LENGTH OF THE FIELD
- MAX LENGTH OF THE FIELD
- NO OF UNIQUE RECORDS FOR EACH FIELD
- NO OF NULLS IN FIELD
- NO OF NON NULL VALUES IN FIELD.
- MIN DATE IN CASE OF DATE OR DATETIME FIELD
- MAX DATE IN CASE OF DATE OR DATETIME FIELD
Sample Table Data:
I dont understand what you mean with Min Length and Max Length, but considering the provided data, you could do something like below.
This query have basically two steps:
WITH
clauseUNION ALL
to compose everything in a single table.Query: