I'm currently using Apache Airflow with the Kubernetes Executor and I've noticed some suspicious behavior that makes me think a task might be running duplicated in Kubernetes. I'm trying to understand if this is the case and if so, why it's happening.
My main suspicion comes from watching the following logs while checking my pods
➜ ~ kubectl get pods -n scheduler
NAME READY STATUS RESTARTS AGE
da-job-boards-pipeline-jobs-normalisation-task-job-title-492abcde 1/1 Running 0 29h
da-job-boards-pipeline-jobs-normalisation-task-locations-tkaabcde 1/1 Running 0 28h
scheduler-scheduler-58f557f548-abcde 2/2 Running 0 6d14h
scheduler-statsd-7dd4494d4f-abcde 1/1 Running 0 13d
scheduler-triggerer-0 2/2 Running 0 8d
scheduler-webserver-5546b8dd66-abcde 1/1 Running 0 3d6h
➜ ~ kubectl get pods -n airflow
NAME READY STATUS RESTARTS AGE
jobs-normalisation-job-title-2bpsbabc 1/1 Running 0 29h
jobs-normalisation-locations-nv670abc 1/1 Running 0 28h
Also, the logs are the same in:
jobs-normalisation-job-title-2bpsbabc
andda-job-boards-pipeline-jobs-normalisation-task-job-title-492abcde
jobs-normalisation-locations-nv670abc
andda-job-boards-pipeline-jobs-normalisation-task-locations-tkaabcde
CONFIGURATIONS
Here's the relevant configuration from my airflow.cfg
:
[kubernetes]
airflow_configmap = scheduler-airflow-config
airflow_local_settings_configmap = scheduler-airflow-config
multi_namespace_mode = True
namespace = scheduler
pod_template_file = /opt/airflow/pod_templates/pod_template_file.yaml
worker_container_repository = SECRET.dkr.ecr.eu-west-1.amazonaws.com/airflow
worker_container_tag = efeb_THIS_IS_A_TAG
[kubernetes_executor]
multi_namespace_mode = True
namespace = scheduler
pod_template_file = /opt/airflow/pod_templates/pod_template_file.yaml
worker_container_repository = SECRET.dkr.ecr.eu-west-1.amazonaws.com/airflow
worker_container_tag = efeb_THIS_IS_A_TAG
[logging]
colored_console_log = False
delete_worker_pods = False
encrypt_s3_logs = True
logging_level = INFO
remote_base_log_folder = s3://scheduler-SECRET-eu-west-1/airflow/logs
remote_log_conn_id = aws_conn
remote_logging = True
In my DAG, I'm using the KubernetesPodOperator
and these are the arguments that I suspect might be causing the tasks to duplicate:
'node_selector': {"abcd.com/tenant": "scheduler"},
'tolerations': [k8s.V1Toleration(key="abcd.com/tenant", operator="Equal", value="scheduler")],
'namespace': "airflow",
'service_account_name': "airflow",
Has anyone encountered a similar issue or can provide insights on whether these configurations might lead to duplicated task runs in Kubernetes? How can I confirm if the task is indeed running duplicated?