Elasticbeanstalk FastAPI application is intermittently not responding to https requests

76 Views Asked by At

I have an Elastic Beanstalk application that is intermittently not responding, and I'm unable to find out why. What Happens:

  1. The app will periodically respond with 200s to my health checks. And then, it will just stop. It will then come back on its own.
  2. Subsequent API calls 200, when the app is in a good mood. And then suddenly all calls fail (until they don't anymore).
  3. In the logs, I don't see any indications of crashing, but I'm new to this. I do see this peculiarity which shows up many times, and shows up corresponding to my api calls that I make to the app:
Mar 31 05:15:47 ip-172-31-28-174 systemd[1]: Starting [email protected] - Refresh policy routes for ens5...
Mar 31 05:15:47 ip-172-31-28-174 ec2net[2485]: Starting configuration for ens5
Mar 31 05:15:48 ip-172-31-28-174 systemd[1]: [email protected]: Deactivated successfully.
Mar 31 05:15:48 ip-172-31-28-174 systemd[1]: Finished [email protected] - Refresh policy routes for ens5.
Mar 31 05:15:48 ip-172-31-28-174 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=refresh-policy-routes@ens5 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Mar 31 05:15:48 ip-172-31-28-174 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=refresh-policy-routes@ens5 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Also, here's the setup:

  1. FastAPI python app, deployed originally through eb cli with classical load balancer. Load balancer was later migrated.
  2. 2 min instances, 4 max instances (all t3 micro)
  3. All instances are healthy.
  4. EB environment is healthy
  5. https:// listener is set up on EB configuration, using cert from AWS.
  6. CNAME configuration for the SSL on subdomain.
  7. Default VPC with two subnets in two separate zones.
  8. Subnets are mapped to route table that maps to IGW enter image description here
  1. Procfile: web: gunicorn main:app --workers=4 --worker-class=uvicorn.workers.UvicornWorker

What could it be? Is it a networking configuration issue? Load balancer? Or something with the application environment? I was also able to deploy my code to a single instance EBS application and had no issues with the downtime. I was not able to easily get https on that instance, so I can't identify if the issue was at the Load balancer level, or not.

1

There are 1 best solutions below

0
tandy On

I was able to figure out what was going on here.

Essentially traffic was being routed to a private subnet that was mapping traffic to an NAT gateway instead of an Internet gateway. Because there were two instances running, only sometimes would the requests be sent to an instance attached to the troubled subnet. To solve the problem, I updated the default subnet to point to an internet gateway on the Route table. (inbound traffic 0.0.0.0 -> IGN). I did this because I was not able to easily change how EBS picks the VPC and default subnets when launching from the command line.

There were a lot of things that led to this problem, which made it hard to troubleshoot. To be clear:

  1. If you create an EBS environment from the command line, it will select default VPC and thus the default subnets for that VPC. (Yes, there are actually defaults that can be set.)
  2. Routetables can also be set as defaults, which can ruin your life if you are not careful with how things are being created.
  3. My original EBS instance was set up with a classic load balancer. I later tried to migrate it to Application. That migration process had no impact to the Elastic beanstalk environment. EBS continued to use the old load balancer and configuration settings on it.