Has anyone else found some strange behaviours regarding certain key words in strings in Azure Synapse's flavour of Spark?
Seem to have found a new one:
I have a table on a lake that contains a column identifying if a value is a Score, Rank or Decile.
If I have a join or a where clause in a SQL query on Spark that follows the process of MyField = 'Decile' it always fails.
I have tried this on the same dataset, recreated from CSV in 3 distinct instances of Synapse all with the same result, however I would appreciate if anyone in the community could give this a try. If this is my data then great..
If this is a genuine thing and not some attribute of my data itself then this raises a much bigger concern - is Synapse vulnerable to injection of key words when performing operators against string fields.... Think back to Log4J and RCE there by parsing strings for key words..
If I try this on the serverless SQL pool not a problem, on the same file/lake database as the error in Spark
If I change the = to like i.e. MyField LIKE 'Decile' then it works on Spark.
The format in use is DELTA in each case, although I started with a CSV and loaded into DELTA.