I'm running a spark job where I'm reading, manipulating and merging a lot of txt files into a single file, but I'm hitting this issue:
Py4JJavaError: An error occurred while calling o8483.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 838 tasks (1025.6 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
Is it possible to increase the size of spark.driver.maxResultSize
?
Note: this question is about the WS Spark “Environments” NOT about Analytics Engine.
You can increase the default value through the Ambari console if you are using "Analytics Engine" spark cluster instance. You can get the link and credentials to the Ambari console from IAE instance in console.bluemix.net. From Ambari console, add a new property in
Make sure the spark.driver.maxResultSize values is less than driver memory which is set in
Another suggestion if you are just trying to create a single CSV file and don't want to change spark conf values since u don't know how large the final file would be, is to use a function like below which uses hdfs getmerge function to create a single csv file just like pandas.