Spark 2.1 - constant Pool for class SpecificUnsafeProjection has grown past JVM limit of 64KB

248 Views Asked by At

Please pardon my ignorance as i am new to pyspark and Spark. I am working on upgrading Spark from 1.6.3 to 2.1 and are running into issues while running our model using pyspark.

All we do while running the python script which is throwing the error is we read in a JSON and convert it into a DF using something like below

df_read = sparkSession.read.json('path to json file')

Post this read, we perform some operations on the DF run some UDF's on the columns and then ultimately are wanting to write back into a JSON which will then be picked up and written to Apache Phoenix Tables.

We are getting the following exception while trying to perform any terminal action on the DF like the show() or take() or any such.

I read up (https://issues.apache.org/jira/browse/SPARK-18016?focusedCommentId=16348980&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16348980) here that the issue is that spark is not able to deal with super wide columns and this issue is fixed in version 2.3.

We have about 450 columns in the DF we are wanting to write.

My question is since we cannot upgrade to Spark 2.3 at this time, is there a work around to this thing? May be split up the columns in 2 DF's and then merge them and write or something like that?

18/12/03 12:34:30 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, xhadoopm686p.aetna.com, executor 1): java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF

After the above exception, it justs prints some generated code and the job fails.

Any information really appreciated.

1

There are 1 best solutions below

0
On

I was able to get around this issue by removing all the nested structures in the code. We were creating some arrays for intermediate calculations and those were causing this issue.

Thanks!