I'm getting a lot of transactions while idling (Airflow and Azure File Share)

274 Views Asked by At

I need to load data from different files into an Azure SQL database. So I set up a VM running Airflow and two Azure File Shares, one for my dags (so that I can modify them without sshing into the VM) and another to drop the files that will be loaded.

I mounted those two fileshares to the VM and my PC and use them as normal drives.

The system is currently idling and I can see in Azure's portal that I'm getting about 24k transactions every 5 minutes, but I can't see specifically what is generating them.

Is it possible the VM is constantly requesting a list of files or touching the fileshare to check if it's still there? How can I avoid this?

Thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

I can confirm that having the dags folder in a shared drive was the cause of the insane amount of transactions. I moved the dags folder to the VM drive and now everything is back to normal.

0
On

I was running into a similar issue, having 8k transactions every 5 minutes for just 3 DAGs. I got it down to about 800 transactions every 5 minutes by setting file_parsing_sort_mode to alphabetical.

https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#file-parsing-sort-mode

The default setting for this, which is modified_time would make the DAG processor retrieve the last modified time of the file from the fileshare on every loop. Weirdly, this action even triggers write operations which are more costly than read operations.

https://github.com/apache/airflow/blob/2d79d730d7ff9d2c10a2e99a4e728eb831194a97/airflow/dag_processing/manager.py#L982-L1008

Same answer posted on a similar question here: https://stackoverflow.com/a/70524563/6654620