Is catch-all handler pointing to "auto" a bad idea?

96 Views Asked by At

My instance has little to no traffic but I have a min-idle instance set to 1. What I notice is that whenever there is a random url (via some bot) that doesn't exist is accessed, it is considered a dynamic request since my catch all handler is auto. This is fine, except I see these 404 errors (404 because there are no http handlers associated with these url patterns even though the yaml defines a catch all pattern) resulting in instance restarts. Why should the instance restart if it runs into 404 errors?

I have all my dynamic handlers follow "/api" pattern and then a few that don't. So, I can explicitly list all valid patterns and map them to the auto handler. Would that then consider these random links as static but not present and throw 404 error (which I am fine with)? I want to make sure the instance doesn't keep running just because of some rouge requests.

2

There are 2 best solutions below

5
On

I just did a local experiment (I don't presently have any quickly deployable play app) and it looks like your quite interesting idea could work.

I replaced the .* pattern previously catching all stragglers and routing them to my default service script (I'm using the python runtime) with specific patterns, then added this handler after all others:

- url: /(.*)$
  static_files: images/\1
  upload: images/.*

My images directory is real, holding static images (but for which I already have another handler with a more specific pattern).

With this in place I made a request to /crap and got, as expected (there is no images/crap file):

INFO 2019-11-08 03:06:02,463 module.py:861] default: "GET /crap HTTP/1.1" 404 -

I added logging calls in my script handler's get() and dispatch() calls to confirm they're not actually getting invoked (the development server request logging casts a bit of doubt).

I also checked on an already deployed GAE app that requesting an image that matches a static handler pattern but which doesn't actually exist gets the 404 answer without causing a service's instance to be started (no instance was running at the time), i.e. it comes directly from the GAE's static content CDN.

So I think it's well worth a try with the go runtime, this could save some significant instance time for an app without a lot of activity faced with random bot traffic.

As for the instance restarts, I suspect what you see is just a symptom of your min-idle instance set to 1. Unlike a dynamic instance the idle (aka resident) instance is not normally meant to handle traffic, it's just ready to do it if/when needed. Only when there is no dynamic instance running (and able to handle incoming traffic efficiently) and a new request comes in that request is immediately routed to the idle instance. At that moment:

  • the idle instance becomes a dynamic one and will continue to serve traffic until it shuts due to inactivity or dies
  • a fresh idle instance is started to meet the min-idle configuration, it will remain idle until another similar event occurs

Note: your idea will help with the instance hours portion used by the dynamic instances, but not with the idle instance portion.

0
On

According to the documentation which quotes the following:

"When an instance responds to the request /_ah/startwith an HTTP status code of 200–299 or 404, it is considered to have started correctly and that it can handle additional requests. Otherwise, App Engine cancels the instance. Instances with manual scale adjustment restart immediately, while instances with basic scale adjustment restart only when necessary to deliver traffic."

You can find more detail about how instances are managed for Standard App Engine environment for Go 1.12 on the link: https://cloud.google.com/appengine/docs/standard/go112/how-instances-are-managed

As well, I recommend you to read the document "How instances are managed", on which quotes the following:

"Secondary routing If a request matches the part [YOUR_PROJECT_ID].appspot.comof the host name, but includes the name of a service, version, or instance that does not exist, the service is routed default. Secondary routing does not apply to custom domains; requests sent to these domains will show an HTTP status code 404if the hostname is not valid."

https://cloud.google.com/appengine/docs/standard/go112/how-instances-are-managed