I am new to Apache Airflow. I have some DAGs already running in the Airflow. Now I want to add SLA's to it so that I can track and monitor the tasks and get alert if something breaks.
I know how to add SLA's to DAGs default_args using timedelta() like below
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'sla': timedelta(minutes=30)
}
But I have below questions:
We can specify SLA for whole DAG or only for tasks individually?
What would be appropriate SLA time for the DAG that is running for 30 minutes?
What would be appropriate SLA time for a task that is running for 5 minutes?
Do we need to consider retry_delay as well while specifying SLA?
I believe SLAs are provisioned only for individual tasks and not for DAG as a whole. But I think the same effect is achievable (can't say for sure though) for entire DAG by creating a task at the end (
DummyOperator
) that is dependent on all other tasks of your DAG and setting an SLA on that closing taskThis would entirely depend on factors like criticality of your task, its failure rate etc. But I would suggest that you begin with a 'strict-enough' timedelta (like 5 minutes) and then tune it (increase or decrease) from there
Same as above, start with 1 minute and tune from there
Going by the docs, I'd say yes