SQL Server Data Masking bug with "FOR JSON PATH" clause

396 Views Asked by At

I'm working with a masked database on my QA server using SQL Server Standard (64-bit) 14.0.1000.169. This is my structure:

    CREATE TABLE [dbo].[Test](
         [Column1] [VARCHAR(64)] NULL,
         [Column2] [VARCHAR(64)] NULL
    )
    GO
    INSERT INTO [dbo].[Test]
    VALUES ('ABCDEFG', 'HIJKLMN')

I've masked the column with the following code:

    ALTER TABLE [dbo].[Test]
    ALTER COLUMN [Column1] VARCHAR(64) MASKED WITH (FUNCTION = 'default()');

It works as expected when I perform the following query using a non-allowed user:

    SELECT [Column1], [Column2]
    FROM [dbo].[Test]
    FOR JSON PATH
    -- RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]'

But it doesn't work when the same non-allowed user saves the result in variable (the main goal):

    DECLARE @var VARCHAR(64)
    SET @var = (SELECT [Column1], [Column2] FROM [dbo].[Test] FOR JSON PATH)
    SELECT @var --it should show a valid JSON...
    -- RESULT: 'xxxx' <-- JSON LOSES ITS STRUCTURE
    -- DESIRED RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]' <-- VALID JSON

Main problem: JSON looses its structure when a masked column appear in the SELECT and "FOR JSON PATH" clause is present.

We want to get a valid JSON even if the data column is masked or not, or even if sa user or not.

I've tested using NVARCHAR or doing a CAST in the masked column, but the only way we get the desired result is using a #tempTable before use the "FOR JSON PATH" clause.

How can I do for SELECT a masked column and save it to VARCHAR variables without loose JSON structure? Any help will be appreciated.

NOTE: The SA user is default allowed to see unmasked data (so the JSON doesn't loose its structure), but we want to execute it on a non-allowed user and return a valid JSON, not only 'xxxx'.

1

There are 1 best solutions below

0
On

It does indeed appear to be a bug. Repro is here. Although see below, not so sure.

When using FOR JSON, or for that matter FOR XML, as a top level SELECT construct, a different code path is used as compared to placing it in a subquery or assigning it to a variable. This is one of the reasons for the 2033-byte limit per row in a bare FOR JSON.

What appears to be happening is that in the case of a bare FOR JSON, the data masking happens at the top of the plan, in a Compute Scalar operator just before the JSON SELECT operator. So the masking happens on just the one column.

PasteThePlan

enter image description here

Whereas when putting inside a subquery, a UDX function operator is used. The problem is that the Compute Scalar is happening after the UDX has created the JSON or XML, whereas it should have been pushed down below the UDX in the plan.

PasteThePlan

enter image description here

I suggest you file the bug with Microsoft, on the Azure Feedback site.


Having gone over this a little, I actually think now that it's not a bug. What does seem to be a bug is the case without nesting.

From the documentation:

Whenever you project an expression referencing a column for which a data masking function is defined, the expression will also be masked. Regardless of the function (default, email, random, custom string) used to mask the referenced column, the resulting expression will always be masked with the default function.

Therefore, when you select any masked column, even in a normal SELECT, if you use a function on the column then the masking always happens after any other functions. In other words, the masking is not applied when the data is read, it is applied when it is finally output to the client.

When using a subquery, the data is fed into a UDX function operator. The compiler now senses that the final resultset is a normal SELECT, just that it needs to mask any final result that came from the masked column. So the whole JSON is masked as one blob, similar to if you did UPPER(yourMaskedColumn). See the XML plan in this fiddle for an example of that.

But when using a bare FOR JSON, it appears to the compiler as a normal SELECT, just that the final output is changed to JSON (the top-level SELECT operator is different). So the masking happens before that point. This seems to me to be a bug.

The bug is even more egregious when you use FOR XML, which uses the same mechanisms. If you use a nested FOR XML ..., TYPE then you get just <masked /> irrespective of whether you nest it or not. Again this is because the query plan shows the masking happening after the UDX. Whereas if you don't use , TYPE then it depends if you nest it. See fiddle.