Haproxy (v2.0.3) with Keepalived (v2.0.7) on CentOS 7.5 returns err empty response for selected apps

341 Views Asked by At

We are running haproxy on two non-production servers balanced by keepalived to manage failover.

We recently upgraded from haproxy 1.5 to 2.0.3. In our non-production environment, we never had a HA solution, so we decided to run keepalived to detect haproxy failure/stoppage and apply the VIPs to the backup server.

When we applied these updates, everything worked pretty well...until we noticed something in the addition of new sites into the lb. When keepalived is restarted (not reloaded) and with the new sites behind the lb the new sites seem to work well for an indeterminate amount time...then they start to return "err_empty_response". Nothing seems to fix this, until keepalived is restarted, then they work again for an indeterminate amount of time and than they will start returning "err_empty_response".

The site is still marked up in the stats page.

The painful part is that the calls stop making it into the haproxy.log file which leads me to think that the problem is not (just) haproxy.

What we have tried:

  • Splitting up each environment into its own virtual interface in keepalived.conf
  • Updating the binding of the api on the backend server to a working api (to eliminate api code as being an option)
  • Creating a new binding with a shortened url
  • Decreasing timeouts (client, server)

keepalived.conf:

`! Configuration File for keepalived

global_defs {
   notification_email {
     [email protected]
   }
   notification_email_from [email protected]
   smtp_server blah.mail.protection.outlook.com.
   smtp_connect_timeout 30
   router_id LVS_NONPROD
}

# Script used to check if HAProxy is running
vrrp_script check_haproxy {
  script "pidof haproxy"
  interval 2
  weight 2
}

vrrp_instance VI_DEV {
  state MASTER
  interface ens160
  virtual_router_id 52
  priority 101
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 1111
  }
  virtual_ipaddress {
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
  }

  track_script {
    check_haproxy
  }

}

vrrp_instance VI_TEST {
  state MASTER
  interface ens160
  virtual_router_id 53
  priority 101
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 1111
  }
  virtual_ipaddress {
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
  }

  track_script {
    check_haproxy
  }

}

vrrp_instance VI_UAT {
  state MASTER
  interface ens160
  virtual_router_id 54
  priority 101
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 1111
  }
  virtual_ipaddress {
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
  }

  track_script {
    check_haproxy
  }

}

vrrp_instance VI_STAGING {
  state MASTER
  interface ens160
  virtual_router_id 55
  priority 101
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 1111
  }
  virtual_ipaddress {
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
  }

  track_script {
    check_haproxy
  }

}

vrrp_instance VI_SS {
  state MASTER
  interface ens160
  virtual_router_id 56
  priority 101
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 1111
  }
  virtual_ipaddress {
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
    xxx.xxx.xxx.xxx
  }

  track_script {
    check_haproxy
  }

}

vrrp_instance VI_NS {
  state MASTER
  interface ens160
  virtual_router_id 57
  priority 101
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 1111
  }
  virtual_ipaddress {
    xxx.xxx.xxx.xxx
  }

  track_script {
    check_haproxy
  }

}`

haproxy globals:

`global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events.  This is done
#    by adding the '-r' option to the SYSLOGD_OPTIONS in
#    /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
#   file. A line like the following can be added to
#   /etc/sysconfig/syslog
#
#    local2.*                       /var/log/haproxy.log
#
log         127.0.0.1 local2 debug

tune.chksize 32768 #don't get me started...dev requirement because of antiquated requirement not coded away
tune.bufsize 32768 #refer to previous statement
tune.ssl.default-dh-param 2048
max-spread-checks 20000
tune.maxpollevents 10000

chroot      /var/lib/haproxy
pidfile     /var/run/haproxy.pid
maxconn     40000
user        haproxy
group       haproxy
daemon

# turn on stats unix socket
stats socket /var/lib/haproxy/stats`

defaults:

`defaults
mode                    http
log                     global
option                  httplog
option                  log-health-checks
option                  dontlognull
option                  http-server-close
option                  redispatch
retries                 3
timeout http-request    10s
timeout queue           60000
timeout connect         10s
timeout client          60000
timeout server          60000
timeout http-keep-alive 30s
timeout check           30s
maxconn                 30000
errorfile 503 /etc/haproxy/errorfiles/503.http`
1

There are 1 best solutions below

0
On

The answer was a bit silly. Internal DNS to the load balancer was incorrect, so remoting to it was impossible until I tried to ssh into the machine during a period where the website was throwing these errors. Turns out that the old load balancer had the ip addresses as part of the network scripts (ie /etc/sysconfig/network-scripts/ifcfg-eth0:0-20).

So, the new instances would work when I restarted keepalived because it would take the ip addresses and the old instance would take them back (subsequently causing a failure because the old instance didn't have the entry in it).

I stopped haproxy on the old instance, removed the /etc/sysconfig/network-scripts/ifcfg-eth0:* files from the old server, restarted keepalived on the new cluster and everything is working as it should.

Feeling a little stupid right now.