I have a SAM stack with 1 lambda, 1 SQS and 1 DLQ. The message in SQS acts as an event source for the lambda. The lambda has ReservedConcurrentExecutions of 1. The batch size of event(from SQS to lambda) is also 1. The timeout of lambda is 300 seconds.
The SQS is a FIFO queue with ContentBasedDeduplication as true. The VisibilityTimeout is of 400 seconds and ReceiveMessageWaitTimeSeconds as 10. SQS has a DLQ linked to it with a re drive policy of maxReceiveCount of 5.
All the messages sent to the FIFO queue have the same messageGroupId. The idea behind this is to ensure all the messages get processed in a FIFO manner and no message is repeated by any chance.
Also, since the ReservedConcurrentExecutions of lambda is set to 1 and the messageGroupId is also same across all messages, it was assumed that this lambda will not throttle in any scenario. But it is still getting throttled.
I can't seem to find any issue with the configuration of my stack that could cause this issue. Does anyone have any insight as to why and how this scenario could have been possible? Or is there a way I can find out which message was throttled?
Also, a point to mention here is that this lambda throttle was not happening till the time the number of messages in the queue was small. As soon as the queue has messages above 1000, the throttle error started appearing. But the count of throttle at any given time was never more than 1. And the throttle happens randomly and never in any fixed pattern. And eventually all messages were processed since I received no messages in DLQ once all messages were processed.
I had read the following on AWS
To allow your function time to process each batch of records, set the source queue's visibility timeout to at least six times the timeout that you configure on your function. The extra time allows for Lambda to retry if your function is throttled while processing a previous batch.
In my case this is not true since my queue's visibility timeout(400) is just 100 seconds more than the lambda's timeout(300). Could this be the cause of my issue?
Following is my CF script for the lambda and sqs
CreateJobsQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: !Sub '${AWS::StackName}-CreateJobsQueue.fifo'
ReceiveMessageWaitTimeSeconds: 10
FifoQueue: True
ContentBasedDeduplication: True
VisibilityTimeout: 400
RedrivePolicy:
deadLetterTargetArn: !GetAtt CreateJobsDLQ.Arn
maxReceiveCount: 5
CreateJobsDLQ:
Type: AWS::SQS::Queue
Properties:
QueueName: !Sub '${AWS::StackName}-CreateJobsDLQ.fifo'
FifoQueue: True
ContentBasedDeduplication: True
MessageRetentionPeriod: 604800
CreateJobsFn:
Type: AWS::Serverless::Function
Properties:
FunctionName: !Sub '${AWS::StackName}-CreateJobsFn'
CodeUri: functions/create-jobs/
Handler: index.handler
Runtime: nodejs16.x
Description: Lambda function to pick up the message from CreateJobsQueue
MemorySize: 512
Timeout: 300
KmsKeyArn: !Sub "arn:aws:kms:${AWS::Region}:${AWS::AccountId}:key/${AppsKMSKeyId}"
ReservedConcurrentExecutions: 1
Policies:
- AWSLambdaBasicExecutionRole
- AWSLambdaENIManagementAccess
Environment:
Variables:
EMAIL_DOMAIN: ""
Layers:
- !Ref LambdaDependencies
VpcConfig:
!If
- IsVPCRequired
-
SubnetIds: !Ref BFnSubnetIds
SecurityGroupIds: !Ref BFnSecurityGroupIds
- !Ref 'AWS::NoValue'
Events:
CreateJobsFnSQSEvent:
Type: SQS
Properties:
Queue: !GetAtt CreateJobsQueue.Arn
BatchSize: 1
Please let me know if any other details needed from my end.