With the current configuration, evey 1h we get new folder with new data in it.
I'm leveraging file notification, and I prefer not to switch to directory listing. However, I encounter an issue with constantly updated CSV files in the latest folder. This causes job failures when the Autoloader attempts to read a CSV file being updated at that moment. I'm exploring ways to exclude the latest folder from being read and have come across the modifiedBefore parameter, but I'm uncertain about its compatibility with the FileNotification.
modifiedBeforeis a generic option in autoloader, which can be used with file notification mode.You mentioned that files arrive every hour and the latest file is updated very frequently, causing errors when you do incremental load using autoloader.
To avoid this, you can either provide a path with a pattern that matches all files except the latest one or use the
modifiedBeforeoption.In both cases, you should know the timestamp.
For example, if you don't need the data after
13:00:00, you can use patterns like below:For more information about patterns you can refer this documentation.
Or, you can use the
modifiedBeforeoption:If you want to filter based on the last hour, you can use the following code to get that:
This gives:
2023-12-20 11:41:39.862054+05:30You can then use that filter in the
modifiedBeforeoption.Note: You need to specify the zone that matches the folder names created every hour.