I have setup a single master with 2 client endpoints in my icintga2 monitoring system using director with Top-Down mode. I have also setup 2 client nodes with both accept configs and accept commands. (hopefully this means I'm running Top Down Command Endpoint mode)
The service checks (disk/mem/load) for the 3 hosts are returning correct results. But my problem is: according to the example from Top Down Command Endpoint example, host icinga2-client1 is using "hostalive" as the host check_command. eg.
object Host "icinga2-client1.localdomain" {
check_command = "hostalive" //check is executed on the master
address = "192.168.56.111"
vars.client_endpoint = name //follows the convention that host name == endpoint name
}
But one issue I have is that if the client1 icinga process is not running, the host status stays GREEN and also all of service status (disk/mem/load) stay all GREEN as well because master is not getting any service check updates and hostalive check command is able to ping the node.
Under Best Practice - Health Check section, it mentioned to use "cluster-zone" check commands. I was expecting while using "cluster-zone", the host status would be RED when the client node icinga process is stopped, but somehow this is not happening.
Does anyone has any idea?
My zone/host/endpoint configurations are as follows:
object Zone "icinga-master" {
endpoints = [ "icinga-master" ]
}
object Host "icinga-master" {
import "Master-Template"
display_name = "icinga-master [192.168.100.71]"
address = "192.168.100.71"
groups = [ "Servers" ]
}
object Endpoint "icinga-master" {
host = "192.168.100.71"
port = "5665"
}
object Zone "rick-tftp" {
parent = "icinga-master"
endpoints = [ "rick-tftp" ]
}
object Endpoint "rick-tftp" {
host = "172.16.181.216"
}
object Host "rick-tftp" {
import "Host-Template"
display_name = "rick-tftp [172.16.181.216]"
address = "172.16.181.216"
groups = [ "Servers" ]
vars.cluster_zone = "icinga-master"
}
object Zone "tftp-server" {
parent = "icinga-master"
endpoints = [ "tftp-server" ]
}
object Endpoint "tftp-server" {
host = "192.168.100.221"
}
object Host "tftp-server" {
import "Host-Template"
display_name = "tftp-server [192.168.100.221]"
address = "192.168.100.221"
groups = [ "Servers" ]
vars.cluster_zone = "icinga-master"
}
template Host "Host-Template" {
import "pnp4nagios-host"
check_command = "cluster-zone"
max_check_attempts = "5"
check_interval = 1m
retry_interval = 30s
enable_notifications = true
enable_active_checks = true
enable_passive_checks = true
enable_event_handler = true
enable_perfdata = true
}
Thanks,
Rick