I have a SAM stack with 1 lambda, 1 SQS and 1 DLQ. The message in SQS acts as an event source for the lambda. The lambda has ReservedConcurrentExecutions of 1. The batch size of event(from SQS to lambda) is also 1. The timeout of lambda is 300 seconds.

The SQS is a FIFO queue with ContentBasedDeduplication as true. The VisibilityTimeout is of 400 seconds and ReceiveMessageWaitTimeSeconds as 10. SQS has a DLQ linked to it with a re drive policy of maxReceiveCount of 5.

All the messages sent to the FIFO queue have the same messageGroupId. The idea behind this is to ensure all the messages get processed in a FIFO manner and no message is repeated by any chance.

Also, since the ReservedConcurrentExecutions of lambda is set to 1 and the messageGroupId is also same across all messages, it was assumed that this lambda will not throttle in any scenario. But it is still getting throttled.

I can't seem to find any issue with the configuration of my stack that could cause this issue. Does anyone have any insight as to why and how this scenario could have been possible? Or is there a way I can find out which message was throttled?

Also, a point to mention here is that this lambda throttle was not happening till the time the number of messages in the queue was small. As soon as the queue has messages above 1000, the throttle error started appearing. But the count of throttle at any given time was never more than 1. And the throttle happens randomly and never in any fixed pattern. And eventually all messages were processed since I received no messages in DLQ once all messages were processed.

I had read the following on AWS

To allow your function time to process each batch of records, set the source queue's visibility timeout to at least six times the timeout that you configure on your function. The extra time allows for Lambda to retry if your function is throttled while processing a previous batch.

In my case this is not true since my queue's visibility timeout(400) is just 100 seconds more than the lambda's timeout(300). Could this be the cause of my issue?

Following is my CF script for the lambda and sqs

CreateJobsQueue:
  Type: AWS::SQS::Queue
  Properties:
    QueueName: !Sub '${AWS::StackName}-CreateJobsQueue.fifo'
    ReceiveMessageWaitTimeSeconds: 10
    FifoQueue: True
    ContentBasedDeduplication: True
    VisibilityTimeout: 400
    RedrivePolicy:
      deadLetterTargetArn: !GetAtt CreateJobsDLQ.Arn
      maxReceiveCount: 5

CreateJobsDLQ:
  Type: AWS::SQS::Queue
  Properties:
    QueueName: !Sub '${AWS::StackName}-CreateJobsDLQ.fifo'
    FifoQueue: True
    ContentBasedDeduplication: True
    MessageRetentionPeriod: 604800

CreateJobsFn:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: !Sub '${AWS::StackName}-CreateJobsFn'
      CodeUri: functions/create-jobs/
      Handler: index.handler
      Runtime: nodejs16.x
      Description: Lambda function to pick up the message from CreateJobsQueue
      MemorySize: 512
      Timeout: 300
      KmsKeyArn: !Sub "arn:aws:kms:${AWS::Region}:${AWS::AccountId}:key/${AppsKMSKeyId}"
      ReservedConcurrentExecutions: 1
      Policies:
        - AWSLambdaBasicExecutionRole
        - AWSLambdaENIManagementAccess
      Environment:
        Variables:
          EMAIL_DOMAIN: ""
      Layers:
        - !Ref LambdaDependencies
      VpcConfig:
        !If
          - IsVPCRequired
          -
            SubnetIds: !Ref BFnSubnetIds
            SecurityGroupIds: !Ref BFnSecurityGroupIds
          - !Ref 'AWS::NoValue'
      Events:
        CreateJobsFnSQSEvent:
          Type: SQS
          Properties:
            Queue: !GetAtt CreateJobsQueue.Arn
            BatchSize: 1

Please let me know if any other details needed from my end.

0

There are 0 best solutions below