Design large scale system for async request monitoring

70 Views Asked by At

I have a service that accepts millions of customer requests and processes them asynchronously. All requests are added to a ddb table for processing and removed from table on completion. The system can accumulate a few billions of requests. On a periodic basis i need to monitor the total number of requests in the table, how many of them are older than 2 hours and what is the age of the oldest request. Other than scanning the whole ddb table periodically which is very expensive and not scalable what other technologies or solutions can i use to answer above questions?

Also is there a name for the above async processing pattern ?

I have tried DDB scanning but as the number of requests increase the system is not scaling well.

2

There are 2 best solutions below

1
tax evader On

I think if the service accumulates billions of requests (per day?), it can be very computationally expensive to keep adding and removing records from table. If your only goal is to keep track of the number of asynchronous requests that are not yet completed, I would recommending using distributed messaging queue like AWS SQS or Apache Kafka to manage the requests in a queue-like data structure which will then received as messages by server nodes to handle those requests. For AWS SQS, you can use it in conjunction with AWS CloudWatch , which allow you to get relevant metrics from the queue. For example:

  • To see the total number of open requests (in the queue), you can access the ApproximateNumberOfMessagesVisible metric.
  • To see the oldest request age, you can use ApproximateAgeOfOldestMessage metric

To see how many of the requests are older than 2 hours, I think you can have the server nodes that handle forwarding requests to the queue to publish a CloudWatch log of the request id along with the receiving timestamp and the server nodes that handle the request from the queue to publish another CloudWatch log that includes request id. That way, you can get the number of open requests that are older than 2 hours by querying the request ids that are a older than 2 hours minus the request ids that are logged by request handler

I don't think there's a specific name for async processing pattern but usually when talking about asynchronous processing, message queue pattern or publish–subscribe pattern usually come into mind

2
Leeroy Hannigan On

You're correct scanning is not going to be efficient. What it sounds to me is that you need global sort order across all of your item based on time.

That will allow you to retrieve the oldest record by consuming 0.5 RCU.

It'll allow you to target only the items in the last N hours efficiently.

Have a read at this blog on the subject: https://aws.amazon.com/blogs/database/effective-data-sorting-with-amazon-dynamodb/