We (an engineering team) are running an EMR
cluster with YARN
and Spark
. What is typically happening is that when one user submits a heavy memory intensive job, it grabs all the YARN available memory and then all the subsequent users submitted jobs have to wait for that memory to clear (I know that autoscaling
will solve this problem to a certain extent and we are looking into that, but we would like to avoid a single user occupying all the memory even when the cluster is autoscale
d to it's full limits).
Is there a way to configure YARN such that any application (Spark
or otherwise) may not occupy more than, say 75% of available memory?
Thanks
According to the documentation, you can manage the amount of memory allocated to an executor using the parameter: spark.executor.memory