AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sources. But its not supported for MSK. Can we create a reserved concurrency of the Lambda function and would it help to concurrently consume from MSK topic
AWS MSK lambda concurrent consumers
2.3k Views Asked by dvlpr AtThere are 4 best solutions below
On
You can set ParallelApplyThreads as more than 1 in the TargetMetadata on the task settings.
Check this document.
On
Lambda has auto scaling to control the concurrency. You usually do not need set the concurrency unless you have specific need. https://aws.amazon.com/about-aws/whats-new/2022/01/aws-lambda-auto-scaling-msk-apache-kafka/
On
TL'DR Connecting lambda to kafka cluster using aws::event-source-mapping is limited to the amount of partitions you are having in the topics
I had the experience to setup a poc of
Custom Kafka Cluster Topic (1 Partition) > EventSourceMapping > Lambda
and after opening a discussion with AWS it looks like it is a limitation
Another approach that I didn't try is to setup a lambda sink (kafka connect) and setup a tasks.max which seems like it can solve this issue
https://docs.confluent.io/kafka-connectors/aws-lambda/current/overview.html#lambda-sink-multiple-tasks
The specifics in documentation is pretty sparse. I also was looking for this, the only thing I've found is from this: https://amazonmsk-labs.workshop.aws/en/msklambda/tpschemareg/overview.html
In it they read from MSK and post to Kinesis so that lambda can process in parallel. It seems like the MSK event source is there mainly for migration if true. Only one consumer is pretty limiting.
Maybe someone who experimented more can leave a better answer.
krishwin's comment at the bottom of this article also seems to indicate this. https://dev.to/danieljameskay/triggering-lambda-functions-from-amazon-msk-316o
A better option might be a AWS lambda sink connector. It looks like it will run a lambda process up to number of partitions: