Icinga2 client Host culster-zone check command not going down (RED) when lost connection

543 Views Asked by At

I have setup a single master with 2 client endpoints in my icintga2 monitoring system using director with Top-Down mode. I have also setup 2 client nodes with both accept configs and accept commands. (hopefully this means I'm running Top Down Command Endpoint mode)

The service checks (disk/mem/load) for the 3 hosts are returning correct results. But my problem is: according to the example from Top Down Command Endpoint example, host icinga2-client1 is using "hostalive" as the host check_command. eg.

object Host "icinga2-client1.localdomain" {
  check_command = "hostalive" //check is executed on the master
  address = "192.168.56.111"

  vars.client_endpoint = name //follows the convention that host name == endpoint name
}

But one issue I have is that if the client1 icinga process is not running, the host status stays GREEN and also all of service status (disk/mem/load) stay all GREEN as well because master is not getting any service check updates and hostalive check command is able to ping the node.

Under Best Practice - Health Check section, it mentioned to use "cluster-zone" check commands. I was expecting while using "cluster-zone", the host status would be RED when the client node icinga process is stopped, but somehow this is not happening.

Does anyone has any idea?

My zone/host/endpoint configurations are as follows:

object Zone "icinga-master" {
    endpoints = [ "icinga-master" ]
}
object Host "icinga-master" {
    import "Master-Template"

    display_name = "icinga-master [192.168.100.71]"
    address = "192.168.100.71"
    groups = [ "Servers" ]
}
object Endpoint "icinga-master" {
    host = "192.168.100.71"
    port = "5665"
}


object Zone "rick-tftp" {
    parent = "icinga-master"
    endpoints = [ "rick-tftp" ]
}
object Endpoint "rick-tftp" {
    host = "172.16.181.216"
}
object Host "rick-tftp" {
    import "Host-Template"

    display_name = "rick-tftp [172.16.181.216]"
    address = "172.16.181.216"
    groups = [ "Servers" ]
    vars.cluster_zone = "icinga-master"
}


object Zone "tftp-server" {
    parent = "icinga-master"
    endpoints = [ "tftp-server" ]
}
object Endpoint "tftp-server" {
    host = "192.168.100.221"
}
object Host "tftp-server" {
    import "Host-Template"

    display_name = "tftp-server [192.168.100.221]"
    address = "192.168.100.221"
    groups = [ "Servers" ]
    vars.cluster_zone = "icinga-master"
}


template Host "Host-Template" {
    import "pnp4nagios-host"

    check_command = "cluster-zone"
    max_check_attempts = "5"
    check_interval = 1m
    retry_interval = 30s
    enable_notifications = true
    enable_active_checks = true
    enable_passive_checks = true
    enable_event_handler = true
    enable_perfdata = true
}

Thanks,

Rick

0

There are 0 best solutions below