I am running spark standalone in Kubernetes, I have a pyspark application that connects to the master using SparkSession. The app is loading around 4gb json files and does some SQL queries.
If I restart the workers, they use around 400/500mb RAM on the worker container, I start my application and the memory goes up to around 4/5GB, after the application finishes the memory only seems to drop by around 1GB. How can I get the worker to release all its memory.
My app is not caching or persisting any dataframes.
My problem is that my app runs hourly, after x number of runs the worker pods restart and the job will lost connection until it switches to a new worker.
You can see in the Grafana graph below.
