Possible Injection Risk on Azure Synapse Spark when using certain key words in strings in where or join clauses

48 Views Asked by At

Has anyone else found some strange behaviours regarding certain key words in strings in Azure Synapse's flavour of Spark?

Seem to have found a new one:

I have a table on a lake that contains a column identifying if a value is a Score, Rank or Decile.

If I have a join or a where clause in a SQL query on Spark that follows the process of MyField = 'Decile' it always fails.

I have tried this on the same dataset, recreated from CSV in 3 distinct instances of Synapse all with the same result, however I would appreciate if anyone in the community could give this a try. If this is my data then great..

If this is a genuine thing and not some attribute of my data itself then this raises a much bigger concern - is Synapse vulnerable to injection of key words when performing operators against string fields.... Think back to Log4J and RCE there by parsing strings for key words..

If I try this on the serverless SQL pool not a problem, on the same file/lake database as the error in Spark

If I change the = to like i.e. MyField LIKE 'Decile' then it works on Spark.

The format in use is DELTA in each case, although I started with a CSV and loaded into DELTA.

0

There are 0 best solutions below