LAMBDA not achieving expected concurrency with MSK Triggers

131 Views Asked by At

I've configured two services on AWS Lambda to trigger off an MSK (Amazon Managed Streaming for Apache Kafka) event source. Both services are supposed to handle messages from 100 partitions. However, while one service consistently achieves 100 concurrent executions, the other service struggles and remains capped at around 78-80 concurrent executions.

I've checked the configuration thoroughly, ensured that the Lambda concurrency limits are appropriately set, and confirmed that the MSK cluster is healthy with no bottlenecks observed. Despite these efforts, the concurrency for one service doesn't exceed 80, whereas the other service operates smoothly at the maximum expected concurrency.

What could potentially cause this discrepancy in concurrency for services triggered by the same MSK setup? Are there specific Lambda or MSK configurations that might need fine-tuning to ensure consistent and higher concurrency for both services?

Any insights, suggestions, or experiences related to Lambda concurrency limits and MSK triggers would be greatly appreciated. Thank you!

1

There are 1 best solutions below

0
On

What could potentially cause this discrepancy in concurrency for services triggered by the same MSK setup?

The parallelism level is decided by Lambda:

When you initially create an Amazon MSK event source, Lambda allocates one consumer to process all partitions in the Kafka topic. Each consumer has multiple processors running in parallel to handle increased workloads. Additionally, Lambda automatically scales up or down the number of consumers, based on workload. To preserve message ordering in each partition, the maximum number of consumers is one consumer per partition in the topic.

In one-minute intervals, Lambda evaluates the consumer offset lag of all the partitions in the topic. If the lag is too high, the partition is receiving messages faster than Lambda can process them. If necessary, Lambda adds or removes consumers from the topic. The scaling process of adding or removing consumers occurs within three minutes of evaluation.

Lambda will only increase parallelism if it sees that your processors are falling behind.