we need to setup 4 EventHub and 3 Azure Functions. So what is the best way to have high throughput and Scalable parameters that we can set to have a system that can handle 75k message/sec?
- Local.settings.json
- hosts.json
- prefetch Count
- max batch side
This article is definitely worth a read and is something I based some of my work on, I needed to achieve 50k p/sec. https://azure.microsoft.com/en-gb/blog/processing-100-000-events-per-second-on-azure-functions/
An important consideration is how many partitions you will have, as this will directly impact your total throughput. As you scale out instance of your application, the Event Processor Host (EPH) will try and take ownership of processing a particular partition, and each partition can process 1MB/sec ingress and 2MB/sec egress. (or, 1000 events p/sec)
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-faq
You need to consider both message size and message counts. If possible, cram as many data points as possible into an event hub message. In my scenario, I'm processing 500 data points in each event hub message - it's much more efficient to extract lots of data from a single message rather than a small amount of data from lots of messages.
For your throughput requirements, this is something you need to consider. Even at 32 partitions, that's not going to give you 75k msg p/sec - you can ask Microsoft to increase the partition count, as they did in the original article I linked, where they have 100 partitions.
As for configuration settings : I'm running with
This means there's up to approx 1.3million data points that could be processed again, in an event that causes the functions to have to begin processing from the last known checkpoint. This is also important - are your updates idempotent, or doesn't matter if they are reprocessed?
You are going to need to put the data from the messages into some sort of data store, and you're going to be inserting at a high rate into that - can your target data store cope with inserts at this high frequency? What happens to your processing pipeline if your target store has an outage? I went with a similar approach as described in this article, which is summarized as 'in the event of any failure when processing a batch of messages, move the entire batch onto an 'errors' hub and let another function try and process them'. You can't stop processing at this volume or you will fall behind!
https://blog.pragmatists.com/retrying-consumer-architecture-in-the-apache-kafka-939ac4cb851a
That's also an important point. How real-time does your processing need to be? If you start falling behind, would you need to scale out to try and catch up? How would you know if this was happening? I created a metric to track how far behind the latest event any partition is, which allows me to visualize and set up alerts on - I also scale out my functions based on this number.
https://medium.com/@dylanm_asos/azure-functions-event-hub-processing-8a3f39d2cd0f
At the volumes you've mentioned - it's not just some configuration that will let you achieve it, there are a number of considerations