Distributed system design for quota on API

726 Views Asked by At

I am designing an API which can be hit only a defined number of times based on the subscription plan. Below are the plans per account:

10M hits per year - $100
100M hits per year - $300
1G hits per year - $600

I have this service running in multiple regions (say 5) and the system is distributed. I need to send a notification if the user exhausts their quota.

What could be the optimal system design to achieve this. I'm looking for what kind of DB to use? How to replicate this data across multiple zones handling heavy concurrency?

1

There are 1 best solutions below

3
On

The very first question to ask - how hard is the limit? From business point of view. For example, if a customer with 10M quota goes over by 1% - is it a problem?

Second feature to look for - TPS - what is traffic's pattern? For example, 1G of requests evenly distributed leads to ~32 requests per second. TPS is important since that may be your bottleneck - especially when do cross region calls.

Third feature to look for is how available your system should be?

In either way you look for a counter - on every request you reduce the counter, and when the counter goes to zero, then you stop all processes.

These counters could be implemented in several ways.

For example, create a queue with given number of tokens and to process a request, servers have to read a token from the queue; no tokens left - no service.

Another option is to have a service which will issue allowance to every service in batches - in this case your resource servers ask for quota and then report back usage.

In either way - it is quite challenging to have "exactly once" processing. There are many different failure modes and that may lead to some tokens being either lost or double spent.

The last part I would like to dedicate to some logical steps:

  1. A request from a customer arrives to a server for processing
  2. Can the server make a local decision on quota? If yes: it means that the server has some part of quota and that needs to be somehow updated. Otherwise, server has to ask another service for quota
  3. Server will ask a service - may I process another request? This ask may travel in the same region or in the other region. Are you ok with intra region request (latency and availability risk) - if yes - go for it. If no:
  4. The quota service has to be regionalized. How will this service shard quota across regions? Maybe split quota and exchange updates periodically (e.g. every second).

And so on. Always picking simplicity.

Personally, I would go with quota service and deploy it into every region and add there a synchronization flow to make sure no tokens are wasted.