Airflow skips one scheduled run

914 Views Asked by At

I have various DAGs scheduled, but especially one DAG at a certain run is not being triggered.

I am aware that Airflow runs a job at the end of the period, but surely I'm missing something.

I have a schedule defined as: 10 2,5,8,11,14,17,20,23 * * *, meaning my job should run everyday at 02.10, 05.10, 08.10, 11.10, 14.10, 17.10, 20.10, 23.10 UTC. For some reason, 23.10 UTC is always skipped, and I don't understand why. Airflow runs my 20.10 run, skips 23.10, and then continue with 02.10.

So my question is why this run is always skipped.

My default DAG arguments are as follows:

default_args = {
        "owner": "whir",
        "depends_on_past": False,
        "start_date": days_ago(0, hour=0, minute=0, second=0, microsecond=0),
        "email": [""],
        "email_on_failure": False,
        "email_on_retry": False,
        "retries": 4,
        "retry_delay": timedelta(minutes=30),
}
with DAG(
    'transfer-data',
    default_args=default_args,
    description="Transfer data",
    schedule_interval='10 2,5,8,11,14,17,20,23 * * *',
    catchup=True
) as dag:

...
1

There are 1 best solutions below

1
glob On BEST ANSWER

Ok my guess for why something's wrong here is that your start_date parameter should be in the DAG definition, not in default_args. Move it out of your default args and instead add it into you DAG definition like:

with DAG(
    'transfer-data',
    default_args=default_args,
    description="Transfer data",
    start_date = (your start date)
    schedule_interval='10 2,5,8,11,14,17,20,23 * * *',
    catchup=True
) as dag:

Airflow is very particular about DAG definitions as it can sometimes cause unexpected behavior in the metadata database on the backend. start_date is a parameter set at the DAG level - you're stating when the DAG should begin. You're not passing it to each individual tasks, which is what default_args should be for.

It's hard to tell just by looking at what you've given us, but my guess is that the start date gets reset around midnight, and that's why it's somehow working for every run other than the 23:10 one.