We have a Kubernetes cluster and are using Prometheus/Grafana to collect and view metrics information. I am currently looking into container metrics, that as far as I can tell, are managed and provided by cAdvisor (https://github.com/google/cadvisor). I assume cAdvisor is installed and running in the cluster, although I haven't been able to verify this (any suggestions on how to determine that would be much appreciated).
On the cAdvisor Prometheus documentation page (https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md) it lists the various metrics that are available. There are some differences between the set of metrics that Prometheus "knows" about in the cluster vs. the set listed in the above link, but I'm guessing this could be due to differences in the versions of cAdvisor.
What's more important to me at this point is the fact that there are a number of metrics that are not being scraped by Prometheus. This phenomenon extends beyond cAdvisor, but I'll focus just on the cAdvisor relevant metrics.
The following metrics are "known" by our instance of Prometheus and are listed in the link above. None of them are being scraped by Prometheus - I get an empty query result. As far as I can tell, the remaining "cAdvisor metrics" do return a result.
container_cpu_cfs_throttled_seconds_total
container_cpu_load_average_10s
container_cpu_system_seconds_total
container_cpu_user_seconds_total
container_file_descriptors
container_fs_io_current
container_fs_io_time_seconds_total
container_fs_io_time_weighted_seconds_total
container_fs_reads_merged_total
container_fs_sector_reads_total
container_fs_sector_writes_total
container_fs_writes_merged_total
container_memory_mapped_file
container_memory_swap
container_spec_cpu_period
container_spec_cpu_quota
container_spec_cpu_shares
container_spec_memory_limit_bytes
container_spec_memory_reservation_limit_bytes
container_spec_memory_swap_limit_bytes
container_tasks_state
container_threads_max
Why are there so many metrics that are not being scraped? In many cases, there seem to be related metrics, where some are scraped and some are never scraped. An example: container_threads is scraped, while container_threads_max is not.
I'm assuming that metrics that start with container are cAdvisor metrics, but maybe I'm mistaken. Could it be that the metrics I see being scraped are defined somewhere else and that ones not being scraped are "real" cAdvisor metrics and they're perhaps not being scraped because either cAdvisor is not installed, or metrics for cAdvisor have not been enabled?
UPDATE:
A comment from @DazWilkin may have directed me to information that might explain some of the unscraped metrics.
I took a look at my Prometheus "kubelet" ServiceMonitor and found a "drop" list that seems to match the list of unscraped metrics listed above. Below is the relevant section.
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
metricRelabelings:
- action: drop
regex: container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)
sourceLabels:
- __name__
- action: drop
regex: container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)
sourceLabels:
- __name__
- action: drop
regex: container_memory_(mapped_file|swap)
sourceLabels:
- __name__
- action: drop
regex: container_(file_descriptors|tasks_state|threads_max)
sourceLabels:
- __name__
- action: drop
regex: container_spec.*
sourceLabels:
- __name__
- action: drop
regex: .+;
sourceLabels:
- id
- pod
path: /metrics/cadvisor
port: https-metrics
relabelings:
- action: replace
sourceLabels:
- __metrics_path__
targetLabel: metrics_path
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
I'm a little concerned with the last action block, which looks like it will drop everything. Maybe I'm interpreting the RegEx incorrectly. I'll also have to say I don't know the significance of the sourceLabels section.