502 errors are due to healthcheck setup or resource exhaustion

723 Views Asked by At

My setup is a bitnami wordpress hosted on GCP's N2-standard-2 VM. I'm using a HTTPS load balancer and CDN.

I encountered the 502 errors a few times ever since I configured a load balancer. I was doing quite a bit of seo and page scanning tests when this happened.

I've checked that the VM is only using 8-12% of the disk capacity. The log shows CPU Max usage is 9.62%. I've to restart the VM to resolve the error.

What are the cause of the 502 errors

  • Could it be due to the traffic spike from third party scanning sites?
  • Is it because of my health check configuration?
  • Do I have to change a machine type and increase the memory?

What should I look into to troubleshoot it?

This is my healthcheck setup This is my healthcheck setup


The server was down again and this time round I managed to look for the information you have suggested.

  1. The error is not from Load Balancer
  2. The error is from VM and the error message is: "Error watching metadata: Get http://169.254.169.254/computeMetadata/v1//?recursive=true&alt=json&wait_for_change=true&timeout_sec=60&last_etag=ag92d16ff423b06: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
  3. VM disk size is 100GB. Machine Type is N2-standard-2 VM
  4. It is a Wordpress Instance
  5. Everything is within Quota
  6. Incidents happen on a few occasions:
    • when I use third party site to scan the website for deadlinks. After the scan is completed, the server will go down shortly after. I have to reboot the instance to make it functional again.
    • It happens randomly and recover by itself after a while

Thanks everyone for your help. I just managed to figure out how to retrieve the other required info.

I was wrong that the load balancer didn't report any errors.

Below is from Logging

  1. From Loadbalancer : Client disconnected before any response
  2. From Loadbalancer: 502 - failed_to_pick_backend
  3. From Unmanaged Instance Group: Timeout waiting for data and HTTP Response Internal server error

I tried to increase the Load Balancer timeout duration, the VM stills shut down and rebooted on its own. Sometimes it takes a few minutes to recover and sometimes it takes about an hour plus.

I provided some screenshots which recorded the recent incident from 8.47 to 8.54.

Below is from Monitoring

enter image description here

enter image description here

enter image description here

0

There are 0 best solutions below