Is it possible that i set fully customized metric for auto scale-out with dataproc worker node in GCP (Google Cloud Platform)??

I want to run Spark distribution processing by dataproc in GCP. But the thing is that, i just want to horizontally scale out worker node based on fully customized metric data. The reason why i am curious about it is that prediction for future data expected to process is available.

now / now+1 / now+2 / now+3
1GB / 2GB / 1GB / 3GB <=== expected data volume (metric)

So could i predictable scale-out/in according to future expected data volumne ?? Thanks in advance.

1

There are 1 best solutions below

0
On

No, currently Dataproc autoscales clusters only based on YARN memory metrics.

You need to write your Spark job in a way that it requests more Spark executors (and as a result YARN memory) when it processes more data, usually it means that you need to split and partition your data more when data size increases.