Running multiple sql queries in hive/impala for testing pass or fail

629 Views Asked by At

I am running 100 queries (test cases) to check for data quality in hive/impala. The majority of the queries check for null values based on some conditions. I am using conditional aggregation to count the trivial test cases like below. I want to add a more complex query condition to this type of check. I also would like to see the counts if there are nulls.

I want to know how to incorporate the more complex query and also add a count if there are nulls present. Expected output below.

What I have so far:

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS' ELSE 'FAIL' END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS' ELSE 'FAIL' END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS' ELSE 'FAIL' END) as car_sale_test       
FROM car_data;

More complex type query to add:

SELECT Count(*), 
       car_job 
FROM   car_data 
WHERE  car_job NOT IN ( "car_type", "car_license", "car_cancellation", 
                        "car_color", "car_contract", "car_metal", "car_number" ) 
        OR car_job IS NULL 
GROUP  BY car_job

Example expected output:

car_type_test  car_color_test  car_sale_test  car_job_test
PASS           PASS             PASS           FAIL
                                               102
1

There are 1 best solutions below

0
On

I would recommend putting this on one row instead of two:

SELECT (CASE WHEN COUNT(*) = COUNT(car_type) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_type))
        END) as car_type_test,
       (CASE WHEN COUNT(*) = COUNT(car_color) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_color))
        END) as car_color_test,
       (CASE WHEN COUNT(*) = COUNT(car_sale) THEN 'PASS'
             ELSE REPLACE('FAIL ([n])', '[n]', COUNT(*) - COUNT(car_sale))
        END) as car_sale_test       
FROM car_data;