Airflow "ghost" running task - task remains in running state even after failure

26 Views Asked by At

I'm working on an Airflow project where I have a KubernetesPodOperator task that sometimes fails but remains in a running state after passing number of defined retries. Status in UI is Running but no worker even up for it. This causes the task to potentially run for an extended period of time, even up to a day.

Here's a simplified version of my DAG:


with DAG(
        dag_id='dbt',
        dagrun_timeout=timedelta(hours=4),
        start_date=datetime(2023, 1, 1),
        schedule_interval="25 * * * *",
        catchup=True,
        max_active_runs=1,
) as dag:

    dbt_test_model = KubernetesPodOperator(
        task_id="dbt_test",
        name="dbt-run-test",
        cmds=["sh", "-c"],
        arguments=["dbt source model_test"],
        get_logs=True,
        retries=3,
        in_cluster=True,
        is_delete_operator_pod=False,
        pod_template_file=DBT_POD_TEMPLATE_PATH,
        dag=dag

    dbt_run_model = KubernetesPodOperator(
        task_id="dbt_run",
        name="dbt-run",
        cmds=["sh", "-c"],
        arguments=["dbt run"],
        get_logs=True,
        retries=3,
        in_cluster=True,
        is_delete_operator_pod=False,
        pod_template_file=DBT_POD_TEMPLATE_PATH,
        dag=dag
    )

I've tried to debug the issue but haven't been able to find a solution. Has anyone encountered a similar issue or have any suggestions on how to handle this?

0

There are 0 best solutions below