Snowflake: Dynamic Data Masking Impact On query results

1.2k Views Asked by At

I have discovered that a query against a table that has a masked policy will return different results based on the role IF a masked column is part of the predicate or a JOIN.

The following query will return no data:

use role NO_READ_ROLE;

select  * 
from    customer_table
where   ssn = '***-**-0102';

But if the the role used applies the mask, it will return the result set.

For the below query, the result set is only returned if the role is FULL READ (unmasked):

select  * 
from    customer_table
where   ssn = '100-10-0102';

Similarly, if we have a self join on a masked column, the results returned will depend on the role used.

select  c1.customer_id
        , c2.first_name
        , c1.ssn
        , c2.ph_no
        , c1.email
from        customer_table c1
        inner join customer_table c2
            on c1.ssn = c2.ssn;

I had hoped/expected that the optimizer would treat the predicates and the JOIN operators the same, regardless of the masking policy impact. Instead, it appears that the masked columns are also masked as part of the query plan.

Based on the above findings, it seems that a good rule of thumb is to not use masked columns as part of the predicate and/or JOINs UNLESS the role used has unrestricted read access to the column.

Does anyone have any other guidance when using masked columns as part of the queries JOIN or WHERE clause?

Reference:

https://docs.snowflake.com/en/user-guide/security-column-intro.html#masking-policies-at-query-runtime

At runtime, Snowflake rewrites the query to apply the masking policy expression to the columns specified in the masking policy. The masking policy is applied to the column regardless of where in a SQL expression the column is referenced, including:

  • PROJECTIONS.
  • JOIN predicates.
  • WHERE clause predicates.
  • ORDER BY and GROUP BY clauses.

Thanks,

2

There are 2 best solutions below

1
On

masking converts column values to the masked values appropriate to the role running the query; the underlying value effectively doesn't exist from the perspective of the role.

If this wasn't the case then masking would be pointless. For example, I could write a Stored Procedure like this (pseudo-code):

for n= 1 to 999999999
    select  c1.customer_id
            , c1.ssn
            , c1.email
            , n
    from customer_table c1
    where c1.ssn = n

next n

This would allow me to list the ssn for every customer, bypassing the masking policy.

I'm not sure what guidance you are looking for, you just need to understand the behaviour when writing queries.

0
On

Have you tried to use context function INVOKER_ROLE() in the mix?

In my use case, I have created a view from two table with inner join on masked columns. By using INVOKER_ROLE() in masked policy definition, I was able to get the same outputs irrespective of roles accessing the view.

Some helpful links: https://docs.snowflake.com/en/sql-reference/functions/invoker_role.html https://www.youtube.com/watch?v=6l49PxnvYKI&t=623s