AWS Canary Deployments of Lambda Functions using API GW Stage

487 Views Asked by At

I've searched for days trying various things I've found on the web including AWS's docs, but I'm still struggling with Canary Deployments. Ultimately we'd like to orchestrate the deployments using Terraform and AWS CLI as needed - but for now I'm just trying to get canary working manually via AWS Console (with AWS CLI as needed).

First step

  • I have an API GW setup
  • lambda function hooked into the stage
  • canary is not enabled in stage
  • The Lambda Function in the Integration Request for the method has no version or alias to the lambda (just the name of the lambda function) *** Lambda returns the latest response ***

Second step

  • I enable canary via Console in API GW Stage.
  • I update the Integration Request -> Lambda Function by adding the current lambda version.
  • I "Deploy API" in API GW. I see the stage and canary deploymentIds are the same. I believe at this point the latest deployment (which includes the latest lambda version and everything else in API GW settings) is saved as both the latest deployed version and the "non-canary"/"previous"/"prod" version. *** Lambda return the latest response *** which I believe is OK since I haven't deployed a change to the lambda since enabling canary.

Third step

  • I change the lambda function to return a different value
  • I hit "Deploy" to push the changes to the lambda function
  • I then "Publish new version"
  • At this point I assume I need to go back to API GW -> Integration Request and update to the latest version in the "Lambda Function" field
  • And now "Deploy API" Now I'm assuming the previous deployment is still the prod/previous/non-canary version and the deploy that just happened is the canary version with all the latest code, versions, API GW settings, etc.

Now here's where things get tricky: If I set Canary percentage to 10% I expect to see 10% of the responses from the API to have the latest response and 90% to the previous version's response - however it is flipped and I'm getting 90% canary responses. If I change the Canary percentage to 90% in API GW -> Stage -> Canary, when testing I average 10% with the latest response.

I'd rather avoid using lambda aliases or stage variables if possible (to keep thing simple). I'm assuming that canary is flipping between 2 deployment snapshots and the latest deployment is the canary. If that is correct, I don't need to use a stageVariable in the "Integration Request -> Lambda Function" to map to the version. And since I'm not trying to do canary via Lambda traffic splitting - I don't need to make lambda aliases.

I've also tried most of the above steps with AWS CLI commands (except modifying the Lambda function code and creating the new lambda version). This yielded similar results.

I'm not trying to get fancy here yet. No CloudFormation, CodeDeploy, Terraform, etc. Automation is the next step once I can get my head around how the API GW level canary works. I'm not trying to do blue green canary either.

Thanks! Looking forward to seeing that it is something obvious/simple to fix. ;)

0

There are 0 best solutions below