How do I include integration-metrics when deploying Datadog DaemonSet + cluster-agent using helm and values.yaml?

2.6k Views Asked by At

Using:

 Kubernetes: 1.18.8
 Helm: 3.3.4
 Datadog DaemonSet agent: 7.23.0
 Datadog cluster-agent: 1.9.0
 Azure Database for PostgreSQL 11.x (i.e. external postgres-service)

I am deploying Datadog as a DaemonSet and with the cluster-agent enabled to a Kubernetes cluster using the instructions provided here.

helm install my-kubernetes -f values.yaml --set datadog.apiKey=<DATADOG_API_KEY> datadog/datadog --set targetSystem=linux

I'm configuring Datadog using the values.yaml file as specified.

I want to do some custom metrics, specifically using the integration formerly known as postgres.yaml. I have tried to do this as specified in the values.yaml template found here, like this (putting it in the cluster-agent, since these are cluster-wide metrics):

# clusterAgent.confd -- Provide additional cluster check configurations
## Each key will become a file in /conf.d
## ref: https://docs.datadoghq.com/agent/autodiscovery/
confd:
  postgres.yaml: |-
    init_config:

    instances:
      - host: my-postgres-host.com
        port: 5432
        username: my-user
        password: some-password
        dbname: some-database
        ssl: True
        tags:
        - some_tag
        custom_queries:
        - metric_prefix: some.prefix
          query: SELECT COUNT(*) FROM bla WHERE timestamp > NOW() - INTERVAL '1 hour';
          columns:
          - name: countLastHour
            type: count

As per the documentation, I can confirm that using the |- prefix this indeed creates a file in the path /etc/datadog-agent/conf.d/postgres.yaml on the node, where I would expect it to. The file correctly has all the contents in the block, i.e. starting with init_config:...

Now, when starting the node I see this in the logs (DEBUG):

'/conf.d/postgres.yaml' -> '/etc/datadog-agent/conf.d/postgres.yaml' /conf.d/..2020_10_22_10_22_27.239825358 -> /etc/datadog-agent/conf.d/..2020_10_22_10_22_27.239825358 '/conf.d/..2020_10_22_10_22_27.239825358/postgres.yaml' -> '/etc/datadog-agent/conf.d/..2020_10_22_10_22_27.239825358/postgres.yaml'

2020-10-22 10:22:29 UTC | CLUSTER | DEBUG | (pkg/autodiscovery/providers/file.go:196 in collectEntry) | Found valid configuration in file: /etc/datadog-agent/conf.d/postgres.yaml

2020-10-22 10:22:29 UTC | CLUSTER | DEBUG | (pkg/collector/scheduler.go:154 in getChecks) | Unable to load a check from instance of config 'postgres': Core Check Loader: Check postgres not found in Catalog

2020-10-22 10:22:29 UTC | CLUSTER | ERROR | (pkg/collector/scheduler.go:201 in GetChecksFromConfigs) | Unable to load the check: unable to load any check from config 'postgres'

The documentation here states, that the postgres yaml-contents in agents v7.x should actually be in /etc/datadog-agent/conf.d/postgres.d/conf.yaml and not in /etc/datadog-agent/conf.d/postgres.yaml. It is not possible to create a subfolder / use forward slashes in the config key (internally, the file is created using ConfigMap).

I'm not even sure if the problem is the yaml-file path or if a core integration is missing. So my broad quest is: how do I enable Datadog postgres-integration correctly in my setup?

2

There are 2 best solutions below

0
On

It looks like the question has been updated to say that this postgres db you are trying to monitor is not actually running the the cluster. And you are not able to put an agent directly on the postgres server since it's a managed service in Azure, so you don't have access to the underlying host.

In those situations it is common to have a random datadog agent on some other host set up the postgres integration anyway, but instead of having host: localhost in the yaml config, put the hostname you would put to access the db externally. In your example it was host: my-postgres-host.com. This provides all the same benefits of the normal integration (except you won't have cpu/disk/resource metrics available obviously)

This is all fine and makes sense, but what if all of the agents you have installed are the agents in the kubernetes daemonset you created? You don't have any hosts directly on VMs to run this check. But we definitely don't recommend configuring the daemonset to run this check directly. If you did, that would mean you are collecting duplicate metrics from that one postgres db in every single node in your cluster. Since every agent is a copy, they'd each be running the same check on the same db you define.

enter image description here

Luckily I notice that you are running the Datadog Cluster Agent. This is a separate Datadog tool that is deployed as a single service once per cluster, instead of a daemonset running once per node. It is possible to have the cluster agent configured to run 'cluster level' checks. Perfect for things like databases, message queues, or http checks.

The basic idea is that (in addition to it's other jobs) the cluster agent will also schedule checks. the DCA (datadog cluster agent) will choose one agent from the daemonset to run the check, and if that node agent pod dies, the DCA will find a new one to run the cluster check.

cluster agent scheduling node agent to query postgres

Here are the docs on how to set up the DCA to run cluster checks: https://docs.datadoghq.com/agent/cluster_agent/clusterchecks/#how-it-works

To configure it you would enable some flags, and give the DCA the yaml file you created with a config map, or just mounting the file directly. The DCA will pass along that config to whichever node agent it chooses to run the check.

4
On

This answer doesn't solve the problem, because postgres is not running in the cluster, it's running in Azure. I'll leave it up since it might be interesting, but I posted another answer for the actually environment setup.


For containerized setups, it's not usually recommended to set up a configmap or try giving the agent a yaml file. Instead the recommended configuration is to put annotations on the postgres pod: https://docs.datadoghq.com/integrations/postgres/?tab=containerized#containerized.

This concept of placing the config on the application pod, not with the datadog agent, is called autodiscovery. This blog post does a good job explaining the benefits of this solution: https://www.datadoghq.com/blog/monitoring-kubernetes-with-datadog/#autodiscovery

Here is a picture diagram showing how the agent goes out to the pods on the same node and would pull the configuration from them:

kubernetes and daemonset agent diagram

To configure this, you'd take each of the sections of the yaml config, convert them to json, and set them as annotations on the postgres manifest. An example of how to set up pod annotations is provided for redis, apache, and http here: https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes#examples

For your scenario I would do something like:

apiVersion: v1
kind: Pod
metadata:
  name: mypostgres
  annotations:
    ad.datadoghq.com/mypostgres.check_names: '["postgres"]'
    ad.datadoghq.com/mypostgres.init_configs: '[{}]'
    ad.datadoghq.com/mypostgres.instances: |
      [
        {
          "host":"%%host%%", 
          "port":5432,
          "username":"my-user",
          "password":"some-password"
        }
      ]
  labels:
    name: mypostgres
spec:
  containers:
    - name: mypostgres
      image: postgres:latest
      ports:
        - containerPort: 5432

notice how the folder name postgres.d/conf.yaml maps to the check_names annotation, the init_configs section maps to init_configs annotation, etc.


For the section on custom metrics, since I personally am more familiar with the yaml config, and it's easier to just fill out, I'll usually go to a yaml to json converter, and copy the json from there

yaml to json converter screeenshot

metadata:
  name: mypostgres
  annotations:
    ad.datadoghq.com/mypostgres.instances: |
      [
        {
          "host": "%%host%%",
          "port": 5432,
          "username": "my-user",
          "password": "some-password",
          "dbname": "some-database",
          "ssl": true,
          "tags": [
            "some_tag"
          ],
          "custom_queries": [
            {
              "metric_prefix": "some.prefix",
              "query": "SELECT COUNT(*) FROM bla WHERE timestamp > NOW() - INTERVAL '1 hour';",
              "columns": [
                {
                  "name": "countLastHour",
                  "type": "count"
                }
              ]
            }
          ]
        }
      ]

A key thing to notice for all those configs is that I never set the hostname. That is automatically discovered by the agent as it scans through containers.

However you may have set my-postgres-host.com because this postgres instance is not actually running in your kubernetes cluster, and is instead living on its own, and not in a container. If that is the case, I would recommend trying to just put the agent on the postgres node directly, all the yaml stuff you've done would work just fine, if that db and the agent are both directly on the vm.