Dataflow polling not being able to see new entries

41 Views Asked by At

I have a dataflow job (created from the template datastream to spanner)running which is fetching new entries in json format from a cloud storage to insert them into spanner database, the new entries are in json format which are saved in the cloud storage from a datastream which listens from a mysql database (any inserts updates or deletes), it has been happening that everything works okay but then after some time (it can be minutes, hours, days, I have tested them in different intervals) it just stops being able to see the new results, and seeing the logs of the worker it just prints this:

current round of polling took 119 ms and returned 5 results, of which 0 were new

but it already has alot more results that it should be polling, what could cause the dataflow to stop being able to see the new results?

EDIT:

I have being reading more carefully each option of the dataflow configuration, this one I think is the cause of this behaviour:

enter image description here

what I understand is that everytime a directory is created in the storage because of some event, the dataflow will start watching it for 10 minutes and get all the changes that happen there, but after those 10 minutes if something happens and gets for some reason inside that directory, the dataflow will not see it anymore, so I believe I should increase this parameter

0

There are 0 best solutions below