Could anyone suggest the approach which I need to follow to achieve the below requirement?

Background:

  1. There is a remote location for example "//severname/somefolder/somefile". Some files would be generating continuously for every second or minute in the remote location.

  2. Spring boot application should be required to check continuously for the new files generating at a remote location (by some scheduler).

  3. If files are available I required to read the oldest to latest file one by one and process them may store into the database.

  4. Once Processed, the particular file is required to remove from the existing remote location and move to another remote location folder.

Points in my mind:

  1. By using the spring batch we can read one file at a time. but how can we read the oldest file dynamically?

  2. How to handle the scenario like: if my batch is processing one of the files is still in progress. if the schedular runs the job again there is a chance of picking the same file for processing.

Appreciate your solution and better suggestion :)

1

There are 1 best solutions below

0
On BEST ANSWER

Polling a directory and running a job for each incoming file is a common pattern that could be achieved with a combination of Spring Batch and Spring Integration. You can find a detailed description of how to implement this pattern in the Launching Batch Jobs through Messages section of the reference documentation.

By using the spring batch we can read one file at a time. but how can we read the oldest file dynamically?

This depends on how you decide to launch jobs. If you decide to run a distinct job for each file, then the code that launches the jobs can sort files as needed and launch jobs sequentially in the right order. If you decide to run a single job for all files with a MultiResourceItemReader for example, then you can provide a Comparator that sorts files as you need, see MultiResourceItemReader#setComparator.

How to handle the scenario like: if my batch is processing one of the files is still in progress. if the schedular runs the job again there is a chance of picking the same file for processing.

This depends on the scheduling tool you use, ie if it supports concurrent job executions or not, etc. The pattern of polling the directory and putting job requests in a queue would solve this by design if the file for which a JobLaunchRequest has been successfully submitted to the queue is (re)moved from the remote directory (ie the subsequent poll won't see it and won't create a duplicate request for it).