Performace issue with NestJs Application with Scale. Global gurads taking too long to process

183 Views Asked by At

I am running a NestJs application hosted on AWS EC2 (with Elastic Beanstalk). It was running just fine until a couple of days ago, now my application is intermittently crashing with numerous Connection timed out errors in my Nginx error log 1892#1892: *884 upstream timed out (110: Connection timed out) while reading response header from upstream, client: {client_ip}, server: localhost, request: "GET {api_endpoint} HTTP/1.1", upstream: "http://127.0.0.1:3000/{api_endpoint}", host: "{server_host}", referrer: "{server_url}"

On further diagnosis, I've noticed the delay is between two global guards; AuthGuard and PermissionsGuard. While the AuthGuard (executed first) receives the request and responds to it in a matter of milliseconds, the PermissionsGuard receives the request after 50 - 60 seconds (of AuthGuard completing its execution). Hence by the time my controller receives the request > 60 seconds has already passed.

This application has an HTTP listener (for my APIs) and a couple of socket listeners (using Node net) for my IoT Service.

As the issue is intermittent, I have been unable to recreate or diagnose the problem. None of these actions seems to have any impact on the timeouts:

  1. Restarting the application
  2. Deleting the ec2 instance and deploying the application in a new ec2 instance (elastic beanstalk immutable deployments)
  3. Changing the Nginx timeout (client_header_timeout, client_body_timeout, keepalive_timeout) from 60 to 120

I am unable to figure out what is causing this delay between two global guards in NestJs

0

There are 0 best solutions below