Performance Effective JSON data masking Snowflake

179 Views Asked by At

I am trying to perform data masking on JSON data. Using a Javascript UDF to update list of NESTED JPATH attributes similar to what is done here, https://www.snowflake.com/blog/masking-semi-structured-data-with-snowflake/

Additionally I tried nested OBJECT_INSERT statements to mask a specific attribute but having multiple attributes to mask I have to build a list of subqueries to perform OBJECT INSERT on previous sub query result which is complex. Ex:

FROM (
SELECT OBJECT_INSERT(VAR_COL,'LVL1',OBJECT_INSERT(VAR_COL:LVL1,'KEY1',OBJECT_INSERT(VAR_COL:LVL1.KEY1,'KEY2','VALUE',TRUE),TRUE),TRUE) AS VAR_COL
FROM TABLE
)

Another problem with OBJECT_INSERT which is not letting me use it is if the JPATH doesn't exists for a specific JSON row it will add that JPATH which I dont want.

I am working with million of records and using XS Warehouse it takes 15 mins to do a simple query using JavaScript UDF.

Alternately, also tried Snowpark UDF but it is also showing very small improvement.

Any idea on improving performance further?

0

There are 0 best solutions below