Spring Integation - batch process to proceed 18000 jobs in 15 mins

97 Views Asked by At

I have a scenario below and is currently leverage Spring integration as the technology to achieve.

  • I have around 18000 staff Id data
  • for each staff, a process needs to kick off to do 1 HTTP call to retrieve staff profile information from mail calender server, then 1 HTTP call to retrieve some other information, then may need to send out 3-5 more HTTP calls in a single task
  • I need to finish this process for above 50000 staff in 15 mins.
  • I will need this whole batch process to run every 15mins again and again.
  • Assume each job takes 5 seconds to finish.. i still need 30 mins to finish

=================

Inital Thinking

I can use spring integration to have something like: - create one job for each staff - 18000 jobs. The job request likely only contains a staff ID so request is very light weight. - add all the jobs to the int:queue at once so it triggers the input channel - calenderSynRequestChannel - have a poller - 100 concurrent workers to clean up the job in 15 mins.

Questions:

  • it is a good way to do this kind of batch processing? some concerns i have is the size of the queue to support 18000 jobs at once
  • should I use file base approach to store all the staff id in multiple files and get picked up later by the poller? however, this will also complicate the design as there could have concurrent issue for read/write/delete the files by the workers.

Current solution:

<int:service-activator ref="synCalenderService" method="synCalender" input-channel="calenderSynRequestChannel">
    <int:poller fixed-delay="50" time-unit="MILLISECONDS" task-executor="taskExecutor" receive-timeout="0" />
</int:service-activator>

<task:executor id="taskExecutor" pool-size="50" keep-alive="120" queue-capacity="500"/>

Anyone encounters similar problem might give a bit of insight on how to address using Spring Integration

1

There are 1 best solutions below

1
On

Why not do a spring batch job that:

  1. Reader that reads the staff data

  2. Processor that make the HTTP calls

  3. Writer that writes the result to a logfile (for example)

Then utilize the TaskScheduler (spring batch framework) to schedule execution for every 15 minutes, or maybe even better with a fixed delay.

If you want to do it more in parallel, utilize the org.springframework.batch.integration.async.AsyncItemProcessor (and writer).