I am designing an API which can be hit only a defined number of times based on the subscription plan. Below are the plans per account:
10M hits per year - $100
100M hits per year - $300
1G hits per year - $600
I have this service running in multiple regions (say 5) and the system is distributed. I need to send a notification if the user exhausts their quota.
What could be the optimal system design
to achieve this. I'm looking for what kind of DB to use? How to replicate this data across multiple zones handling heavy concurrency?
The very first question to ask - how hard is the limit? From business point of view. For example, if a customer with 10M quota goes over by 1% - is it a problem?
Second feature to look for - TPS - what is traffic's pattern? For example, 1G of requests evenly distributed leads to ~32 requests per second. TPS is important since that may be your bottleneck - especially when do cross region calls.
Third feature to look for is how available your system should be?
In either way you look for a counter - on every request you reduce the counter, and when the counter goes to zero, then you stop all processes.
These counters could be implemented in several ways.
For example, create a queue with given number of tokens and to process a request, servers have to read a token from the queue; no tokens left - no service.
Another option is to have a service which will issue allowance to every service in batches - in this case your resource servers ask for quota and then report back usage.
In either way - it is quite challenging to have "exactly once" processing. There are many different failure modes and that may lead to some tokens being either lost or double spent.
The last part I would like to dedicate to some logical steps:
And so on. Always picking simplicity.
Personally, I would go with quota service and deploy it into every region and add there a synchronization flow to make sure no tokens are wasted.