Rails + Nginx - why i should use fail_timeout=0 for multiple nodes?

2.2k Views Asked by At

In nginx example config file here https://github.com/defunkt/unicorn/blob/master/examples/nginx.conf you may see that:

# The only setting we feel strongly about is the fail_timeout=0
# directive in the "upstream" block.  max_fails=0 also has the same
# effect as fail_timeout=0 for current versions of nginx and may be
# used in its place.

As I understand, they think that all users will get 504 Bad Request error in case of one server in upstream block if one of the request was killed by timeout or returned something that considered as a bad requset (http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream).

So in upstream block they have:

  upstream app_server {
    # fail_timeout=0 means we always retry an upstream even if it failed
    # to return a good HTTP response (in case the unicorn master nukes a
    # single worker for timing out).

    # for UNIX domain socket setups:
    server unix:/path/to/.unicorn.sock fail_timeout=0;

    # for TCP setups, point these to your backend servers
    # server 192.168.0.7:8080 fail_timeout=0;
    # server 192.168.0.8:8080 fail_timeout=0;
    # server 192.168.0.9:8080 fail_timeout=0;
  }

I am using least_conn directive in upstream block. So if one of unicorns down it will very fast answer with, for example, 500 error. And because of that 99% of all requests will be send to this node. In other words, if one node down - the whole app is down.

I am thinking of trying something like that:

 upstream app_server {
    least_conn;
    server 192.168.0.7:8080 fail_timeout=10s max_fails=5;
    server 192.168.0.8:8080 fail_timeout=10s max_fails=5;
    server 192.168.0.9:8080 fail_timeout=10s max_fails=5;
}

According to nginx doc (http://nginx.org/en/docs/http/ngx_http_upstream_module.html#server) it means that one of the servers will be marked as DOWN for next 10 second it it will send 5 bad answers in 10 second. I do not see any flaws. What do you think? I barely found any examples where fail_timeout is not 0.

1

There are 1 best solutions below

0
On

This is old, but thought I'd answer since it became relevant to me recently.

It's pretty easy for bots to send misconfigured junk to a server and have it respond with an error code. This results in nginx taking all your servers down when the bots hit it decently hard.

fail_timeout=0 just has it re-check every time, which doesn't seem to come with much of a disadvantage.