How to trigger airflow jobs based on flink streaming completion for partitions?

1.9k Views Asked by At

I have a flink streaming job which reads from Kafka and writes into appropriate partitions in file system. For instance, the job is configured to use a bucketing sink which writes to /data/date=${date}/hour=${hour}.

How to detect that the partition is ready to be used so that a corresponding airflow pipeline can do some batch processing on top of that hour?

1

There are 1 best solutions below

0
On

You could look at the implementation of the ContinuousFileMonitoringSource, to see how it monitors the file system. And then do something similar to what David Anderson suggested in your other question, re creating a custom ProcessFunction.