I have discovered that a query against a table that has a masked policy will return different results based on the role IF a masked column is part of the predicate or a JOIN.
The following query will return no data:
use role NO_READ_ROLE;
select *
from customer_table
where ssn = '***-**-0102';
But if the the role used applies the mask, it will return the result set.
For the below query, the result set is only returned if the role is FULL READ (unmasked):
select *
from customer_table
where ssn = '100-10-0102';
Similarly, if we have a self join on a masked column, the results returned will depend on the role used.
select c1.customer_id
, c2.first_name
, c1.ssn
, c2.ph_no
, c1.email
from customer_table c1
inner join customer_table c2
on c1.ssn = c2.ssn;
I had hoped/expected that the optimizer would treat the predicates and the JOIN operators the same, regardless of the masking policy impact. Instead, it appears that the masked columns are also masked as part of the query plan.
Based on the above findings, it seems that a good rule of thumb is to not use masked columns as part of the predicate and/or JOINs UNLESS the role used has unrestricted read access to the column.
Does anyone have any other guidance when using masked columns as part of the queries JOIN or WHERE clause?
Reference:
At runtime, Snowflake rewrites the query to apply the masking policy expression to the columns specified in the masking policy. The masking policy is applied to the column regardless of where in a SQL expression the column is referenced, including:
- PROJECTIONS.
- JOIN predicates.
- WHERE clause predicates.
- ORDER BY and GROUP BY clauses.
Thanks,
masking converts column values to the masked values appropriate to the role running the query; the underlying value effectively doesn't exist from the perspective of the role.
If this wasn't the case then masking would be pointless. For example, I could write a Stored Procedure like this (pseudo-code):
This would allow me to list the ssn for every customer, bypassing the masking policy.
I'm not sure what guidance you are looking for, you just need to understand the behaviour when writing queries.