I have the DAG:
dag = DAG(
dag_id='example_bash_operator',
default_args=args,
schedule_interval='0 0 * * *',
start_date=days_ago(2),
dagrun_timeout=timedelta(minutes=60),
tags=['example']
)
What is the significance of dag.cli()? What role does cli() play?
if __name__ == "__main__":
dag.cli()
Today is 14th oct. When i add catchup false, it executes for 13 oct. Should not it just execute only for 14th. Without it executes for 12 and 13 which makes sense as it would backfill. But with catchup false why does it execute for 13th oct?
dag = DAG(
dag_id='example_bash_operator',
default_args=args,
schedule_interval='0 0 * * *',
start_date=days_ago(2),
catchup=False,
dagrun_timeout=timedelta(minutes=60),
tags=['example']
)
You should avoid setting the
start_date
to a relative value - this can lead to unexpected behaviour as this value is newly interpreted everytime the DAG file is parsed.There is a long description within the Airflow FAQ:
Regarding
dag.cli()
, I would remove this whole part - it's definitely not required by DAG to be executed by airflow scheduler, see this question.Regarding
catchup=False
and why it executes for the 13th of October - Have a look on scheduler documentationAlso the article Scheduling Tasks in Airflow might be worth a read.