Imagine a simple event driven system consisting of;
Microservice A that processes received documents;
- Receives multi-page document
- Processes each page of the document
- Produces a "Page Processed" event for each page
Microservice B that runs additional processing on pages;
- Consumes "Page Processed" events
- Runs some additional processing on the page if required
- Produces "Page Processed Some More" events
Microservice C needs to run additional processing after all events related to a document have been successfully produced, ideally by consuming some kind of "Document Processed" event.
The problem is, neither Microservice A nor Microservice B know when all pages have been fully processed and in future more microservices could be created to do additional processing on the pages.
I haven't tried anything in practise yet as I am still looking at design of the system, however, I have thought about a few possible solutions though I dislike all of them so I'm looking for a better way or at least what is the recommended approach.
- Microservice A could produce another event at the start detailing the number of pages the document has, Microservice C can subscribe to that event and then consume all the Page Processed and Page Processed Some More events it expects, essentially counting them up.
- Microservice D (new service) could do the event counting similar to the above and produce the "Document Processed" event
- Process the pages in serial fashion so each page triggers the next page to be processed and at the end of all processing when no pages are left a "Document Processed" event is raised
Solution 1 I dislike because it means Microservice C now needs to know about events it shouldn't care about.
Solution 2 I dislike because it means the overhead of an extra service when I feel like this is a common scenario that there must be a better solution for.
Both solution 1 and solution 2 also mean that if other services are introduced that do additional processing on pages then you have to modify another service which causes an undesired dependency.
Solution 3 is the simplest but as it can no longer process the pages in parallel it's very slow for documents with many pages.