Airflow spinning up multiple subprocess for a single task and hanging

421 Views Asked by Bhavani Ravi At 17 October 2025 at 04:20

Airflow version = 1.10.10

Hosted on Kubernetes, Uses Kubernetes executor.

DAG setup

DAG - Is generated using the dynamic dag

Task - Is a PythonOperator that pulls some data, runs an inference, stores the predictions.

Where does it hang? - When running the inference using tensorflow

More details

One of our running tasks, as mentioned above, was hanging for 4 hours. No amount of restarting can help it to recover from that point. We found out that the pod had almost 30+ subprocess and 40GB of memory used.

We weren't convinced because when running on a local machine, the model doesn't consume more than 400MB. There is no way it can suddenly bump up to 40GB in memory.

Another suspicion was maybe it's spinning up so many processes because we are dynamically generating around 19 DAGS. I changed the generator to generate only 1, and the processes didn't vanish. The worker pods still had 35+ subprocesses with the same memory.

Here comes the interesting part, I wanted to be really sure that it's not the dynamic DAG. Hence I created an independent DAG that prints out 1..100000 while pausing for 5 seconds each. The memory usage was still the same but not the number of processes.

At this point, I am not sure which direction to take to debug the issue further.

Questions

Why is the task hanging?
Why are there so many sub-processes when using dynamic dag?
How can I debug this issue further?
Have you faced this before, and can you help?

Original Q&A

Airflow spinning up multiple subprocess for a single task and hanging

DAG setup

More details

Questions

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in AIRFLOW-SCHEDULER

Related Questions in AIRFLOW

Trending Questions

Popular # Hahtags

Popular Questions