airflow data aware scheduling

178 Views Asked by At

When we use airflow dataset in data aware scheduling in Airflow

The statement below:

"Note that the dataset is considered to be updated if the task that is responsible for the update, finishes successfully. There is no way for Airflow to know if the dataset is actually updated, so the developer is responsible for making sure that the update is complete. In other words, no polling is taking place to ensure that the resource received an update."

I have two questions:

  1. Does it mean the dataset updated or not is not decided by "if the data is updated or not actually), it is actually defined by if the task using the dataset as outlet finished or not. If that task is finished, then the dataset is considered as updated (no matter that data is actually updated or not). In other words, if you manually change the data directly, the dataset is not considered as updated.
  2. Dataset updated or not, Airflow knows about the status of the dataset by checking that if the tasks (that are using this dataset as outlet) are finished or not. In other words, Airflow will not like file sensor does, check the data file once a while (on the set up interval) to see if the data file is updated or not.
0

There are 0 best solutions below