Using:
Kubernetes: 1.18.8
Helm: 3.3.4
Datadog DaemonSet agent: 7.23.0
Datadog cluster-agent: 1.9.0
Azure Database for PostgreSQL 11.x (i.e. external postgres-service)
I am deploying Datadog as a DaemonSet and with the cluster-agent enabled to a Kubernetes cluster using the instructions provided here.
helm install my-kubernetes -f values.yaml --set datadog.apiKey=<DATADOG_API_KEY> datadog/datadog --set targetSystem=linux
I'm configuring Datadog using the values.yaml
file as specified.
I want to do some custom metrics, specifically using the integration formerly known as postgres.yaml
. I have tried to do this as specified in the values.yaml
template found here, like this (putting it in the cluster-agent, since these are cluster-wide metrics):
# clusterAgent.confd -- Provide additional cluster check configurations
## Each key will become a file in /conf.d
## ref: https://docs.datadoghq.com/agent/autodiscovery/
confd:
postgres.yaml: |-
init_config:
instances:
- host: my-postgres-host.com
port: 5432
username: my-user
password: some-password
dbname: some-database
ssl: True
tags:
- some_tag
custom_queries:
- metric_prefix: some.prefix
query: SELECT COUNT(*) FROM bla WHERE timestamp > NOW() - INTERVAL '1 hour';
columns:
- name: countLastHour
type: count
As per the documentation, I can confirm that using the |-
prefix this indeed creates a file in the path /etc/datadog-agent/conf.d/postgres.yaml
on the node, where I would expect it to. The file correctly has all the contents in the block, i.e. starting with init_config:...
Now, when starting the node I see this in the logs (DEBUG):
'/conf.d/postgres.yaml' -> '/etc/datadog-agent/conf.d/postgres.yaml' /conf.d/..2020_10_22_10_22_27.239825358 -> /etc/datadog-agent/conf.d/..2020_10_22_10_22_27.239825358 '/conf.d/..2020_10_22_10_22_27.239825358/postgres.yaml' -> '/etc/datadog-agent/conf.d/..2020_10_22_10_22_27.239825358/postgres.yaml'
2020-10-22 10:22:29 UTC | CLUSTER | DEBUG | (pkg/autodiscovery/providers/file.go:196 in collectEntry) | Found valid configuration in file: /etc/datadog-agent/conf.d/postgres.yaml
2020-10-22 10:22:29 UTC | CLUSTER | DEBUG | (pkg/collector/scheduler.go:154 in getChecks) | Unable to load a check from instance of config 'postgres': Core Check Loader: Check postgres not found in Catalog
2020-10-22 10:22:29 UTC | CLUSTER | ERROR | (pkg/collector/scheduler.go:201 in GetChecksFromConfigs) | Unable to load the check: unable to load any check from config 'postgres'
The documentation here states, that the postgres yaml-contents in agents v7.x should actually be in /etc/datadog-agent/conf.d/postgres.d/conf.yaml
and not in /etc/datadog-agent/conf.d/postgres.yaml
. It is not possible to create a subfolder / use forward slashes in the config key (internally, the file is created using ConfigMap).
I'm not even sure if the problem is the yaml-file path or if a core integration is missing. So my broad quest is: how do I enable Datadog postgres-integration correctly in my setup?
It looks like the question has been updated to say that this postgres db you are trying to monitor is not actually running the the cluster. And you are not able to put an agent directly on the postgres server since it's a managed service in Azure, so you don't have access to the underlying host.
In those situations it is common to have a random datadog agent on some other host set up the postgres integration anyway, but instead of having
host: localhost
in the yaml config, put the hostname you would put to access the db externally. In your example it washost: my-postgres-host.com
. This provides all the same benefits of the normal integration (except you won't have cpu/disk/resource metrics available obviously)This is all fine and makes sense, but what if all of the agents you have installed are the agents in the kubernetes daemonset you created? You don't have any hosts directly on VMs to run this check. But we definitely don't recommend configuring the daemonset to run this check directly. If you did, that would mean you are collecting duplicate metrics from that one postgres db in every single node in your cluster. Since every agent is a copy, they'd each be running the same check on the same db you define.
Luckily I notice that you are running the Datadog Cluster Agent. This is a separate Datadog tool that is deployed as a single service once per cluster, instead of a daemonset running once per node. It is possible to have the cluster agent configured to run 'cluster level' checks. Perfect for things like databases, message queues, or http checks.
The basic idea is that (in addition to it's other jobs) the cluster agent will also schedule checks. the DCA (datadog cluster agent) will choose one agent from the daemonset to run the check, and if that node agent pod dies, the DCA will find a new one to run the cluster check.
Here are the docs on how to set up the DCA to run cluster checks: https://docs.datadoghq.com/agent/cluster_agent/clusterchecks/#how-it-works
To configure it you would enable some flags, and give the DCA the yaml file you created with a config map, or just mounting the file directly. The DCA will pass along that config to whichever node agent it chooses to run the check.