Is it OK to replace commons-text-1.6.jar by commons-text-1.10.jar (related to security alert CVE-2022-42889 / QID 377639 Text4Shell)? Would it introduce compatibility issues for the users pyspark code? The reason for this question is in many settings, folks dont have a rich regression test suites to test for pyspark/spark changes.
Here are the background info:
On 2022-10-13, the Apache Commons Text team disclosed CVE-2022-42889 (also tracked as QID 377639, and named Text4Shell): that prior to V1.10, using StringSubstitutor could trigger unwanted network access or code execution.
Pyspark packages include commons-jar-1.6.0 in lib/jars directory. The presence of such jar could trigger a security finding and require security remediation in a enterprise setting.
In going through the source code of both spark (master branch, 3.2+ ), StringSubstitutor is used in spark ErrorClassesJSONReader.scala only. Pyspark does not seem to use StringSubstitutor directly, but it is not clear if pyspark code uses this ErrorClassesJSONReader or not. (Grep of pyspark 3.1.2 source code does not yield any result. Grep of json yields several files in sql and ml direcotries)
I have assembled a conda env with pyspark, and then replace the commons-text-1.6.jar by commons-text-1.10.jar. The several test cases I tried did work OK.
So the questions are: does anyone know if there is any compatibility issue in replacing commons-text-1.6.jar by commons-text-1.10.jar ? (Will it break user pyspark/spark code?)
Thanks,
There appears to be the similar item under the spark issue https://issues.apache.org/jira/browse/SPARK-40801 and it has completed PRs that went into that changed the versions for commons-text to 1.10.0