I'm working with a masked database on my QA server using SQL Server Standard (64-bit) 14.0.1000.169. This is my structure:
CREATE TABLE [dbo].[Test](
[Column1] [VARCHAR(64)] NULL,
[Column2] [VARCHAR(64)] NULL
)
GO
INSERT INTO [dbo].[Test]
VALUES ('ABCDEFG', 'HIJKLMN')
I've masked the column with the following code:
ALTER TABLE [dbo].[Test]
ALTER COLUMN [Column1] VARCHAR(64) MASKED WITH (FUNCTION = 'default()');
It works as expected when I perform the following query using a non-allowed user:
SELECT [Column1], [Column2]
FROM [dbo].[Test]
FOR JSON PATH
-- RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]'
But it doesn't work when the same non-allowed user saves the result in variable (the main goal):
DECLARE @var VARCHAR(64)
SET @var = (SELECT [Column1], [Column2] FROM [dbo].[Test] FOR JSON PATH)
SELECT @var --it should show a valid JSON...
-- RESULT: 'xxxx' <-- JSON LOSES ITS STRUCTURE
-- DESIRED RESULT: '[{"Column1":"xxxx", "Column2":"HIJKLMN"}]' <-- VALID JSON
Main problem: JSON looses its structure when a masked column appear in the SELECT and "FOR JSON PATH" clause is present.
We want to get a valid JSON even if the data column is masked or not, or even if sa
user or not.
I've tested using NVARCHAR or doing a CAST in the masked column, but the only way we get the desired result is using a #tempTable before use the "FOR JSON PATH" clause.
How can I do for SELECT a masked column and save it to VARCHAR variables without loose JSON structure? Any help will be appreciated.
NOTE: The SA user is default allowed to see unmasked data (so the JSON doesn't loose its structure), but we want to execute it on a non-allowed user and return a valid JSON, not only 'xxxx'.
It does indeed appear to be a bug. Repro is here. Although see below, not so sure.
When using
FOR JSON
, or for that matterFOR XML
, as a top levelSELECT
construct, a different code path is used as compared to placing it in a subquery or assigning it to a variable. This is one of the reasons for the 2033-byte limit per row in a bareFOR JSON
.What appears to be happening is that in the case of a bare
FOR JSON
, the data masking happens at the top of the plan, in aCompute Scalar
operator just before theJSON SELECT
operator. So the masking happens on just the one column.PasteThePlan
Whereas when putting inside a subquery, a
UDX
function operator is used. The problem is that theCompute Scalar
is happening after theUDX
has created the JSON or XML, whereas it should have been pushed down below theUDX
in the plan.PasteThePlan
I suggest you file the bug with Microsoft, on the Azure Feedback site.
Having gone over this a little, I actually think now that it's not a bug. What does seem to be a bug is the case without nesting.
From the documentation:
Therefore, when you select any masked column, even in a normal
SELECT
, if you use a function on the column then the masking always happens after any other functions. In other words, the masking is not applied when the data is read, it is applied when it is finally output to the client.When using a subquery, the data is fed into a
UDX
function operator. The compiler now senses that the final resultset is a normalSELECT
, just that it needs to mask any final result that came from the masked column. So the whole JSON is masked as one blob, similar to if you didUPPER(yourMaskedColumn)
. See the XML plan in this fiddle for an example of that.But when using a bare
FOR JSON
, it appears to the compiler as a normalSELECT
, just that the final output is changed to JSON (the top-levelSELECT
operator is different). So the masking happens before that point. This seems to me to be a bug.The bug is even more egregious when you use
FOR XML
, which uses the same mechanisms. If you use a nestedFOR XML ..., TYPE
then you get just<masked />
irrespective of whether you nest it or not. Again this is because the query plan shows the masking happening after theUDX
. Whereas if you don't use, TYPE
then it depends if you nest it. See fiddle.