I have a code which we want to run in n number of Pods simulataneously. When I was running manually I used to launch Kubernetes Job by giving Parallelism and Completetions in the yaml File.
apiVersion: batch/v1
kind: Job
metadata:
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::{number}:role/{access_name}"
name: test-job
namespace: analytics
spec:
completions: 1000
parallelism: 1000
template:
Now I want to Automate this process with Airflow, however, airflow only has KubernetesPodOperator but no JobOperator. Is there any way I can achieve the same using KubernetesPodOperator.
Limitations:
- We can't use any other library due to very strict restrictions, so need to get the Job done using the default available operators in Airflow
I have tried creating N number of KubernetesPodoperator resulting in N number of Tasks. However, the number of parallelisms is dynamic and if the parallelism that we want is very large (like 100K) creating that many tasks in Airflow is not feasible. So looking for a way to achieve this using only 1 single Task