So asking if anyone knows a way to change the Spark properties (e.g. spark.executor.memory, spark.shuffle.spill.compress, etc) during runtime, so that a change may take effect between the tasks/stages during a job...
So I know that...
1) The documentation for Spark 2.0+ (and previous versions too) state that once the Spark Context has been created, it can't be changed in runtime.
2) SparkSession.conf.set that may change a few things for SQL, but I was looking at more general, all encompassing configurations.
3) I could start a new context in the program with new properties, but the case here is to actually tune the properties once a job is already executing.
Ideas...
1) Would killing an Executor force it to read a configuration file again, or does it just get what's already configured during the beginning of the job?
2) Is there any command to force a "refresh" of the properties in spark context?
So hoping there might be a way or other ideas out there (thanks in advance)...
No, it is not possible to change settings like
spark.executor.memory
at runtime.In addition, there are probably not too many great tricks in the direction of 'quickly switching to a new context' as the strength of spark is that it can pick up data and keep going. What you essentially are asking for is a map-reduce framework. Of course you could rewrite your job into this structure, and divide the work across multiple spark jobs, but then you would lose some of the ease and performance that spark brings. (Though possibly not all).
If you really think the request makes sense on a conceptual level, you could consider making a feature request. This can be through your spark supplier, or directly by logging a Jira on the apache Spark project.