How to provision/deprovision EC2 nodes dynamically for purpose of Airflow Workers?

57 Views Asked by At

I am using Airflow in my AWS EKS cluster. I've deployed it using Airflow Helm Chart (User Community), and I am using KubernetesExecutor.

Some of my DAGs run a task that does a ML training every once a week in the Airflow Worker. The worker in default will be a Kubernetes Pod defined in airflow.kubernetesPodTemplate.* of values.yaml.

This training requires quite a lot of vCPUs and Memory (e.g., 24 vCPUs, 64GiB Memory), but doesn't necessarily take a lot of time (e.g., it ends in about an hour).

So, I want KubernetesExecutor to request for an EC2 node that meets above requirement (e.g., m5.8xlarge) when the DAG is triggered, and de-provision (or terminate) the node from the cluster after the task is finished.

I don't want an m5.8xlarge instance stay up all the time in my cluster just for an hour of training per week.

Is this possible?

It would be perfect if I can choose and configure different Operator for each DAG, since not all DAGs do ML training tasks, and if I can freely provision and de-provision nodes in which Workers (Kubernetes Pods) temporarily reside.

Where in the values.yaml should I change?

0

There are 0 best solutions below