I moved from gvisor-containerd-shim (Shim V1) to containerd-shim-runsc-v1 (Shim V2). The metrics server and the Horizontal Pod Autoscaler used to work just fine in the case of gvisor-containerd-shim.
But now, with containerd-shim-runsc-v1, I keep getting CPU and memory metrics for nodes and runc pods, but I only get memory metrics for runsc (gvisor) pods.
For example, I deployed a PHP server in a gvisor pod with containerd-shim-runsc-v1. I get the following metrics:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 68s
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
snf-877559 549m 13% 2327Mi 39%
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
php-apache-gvisor-6f7bb6cf84-28qdk 0m 52Mi
After sending some load to the php-apache-gvisor pod, I can see CPU and memory usage increment for the node and for the runc pod (load-generator). I can also see that php-apache-gvisor's memory is increased from 52 to 72 Mi but its CPU usage remains at 0%. Why does the cpu usage remain at 0%?
I also tried with different container images, but I keep getting same results.
With load I get the following metrics:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 68s
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
snf-877559 946m 23% 2413Mi 41%
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
load-generator-7d549cd44-xmbqw 3m 1Mi
php-apache-gvisor-6f7bb6cf84-28qdk 0m 72Mi
Further infos:
kubeadm, kubernetes 1.15.3, containerd 1.3.3, runsc nightly/2019-09-18, flannel
kubectl logs metrics-server-74657b4dc4-8nlzn -n kube-system
I0728 09:33:42.449921 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0728 09:33:44.153682 1 secure_serving.go:116] Serving securely on [::]:4443
E0728 09:35:24.579804 1 reststorage.go:160] unable to fetch pod metrics for pod default/php-apache-gvisor-6f7bb6cf84-28qdk: no metrics known for pod
E0728 09:35:39.940417 1 reststorage.go:160] unable to fetch pod metrics for pod default/php-apache-gvisor-6f7bb6cf84-28qdk: no metrics known for pod
/etc/containerd/config.toml (containerd-shim-runsc-v1)
subreaper = true
oom_score = -999
disabled_plugins = ["restart"]
[debug]
level = "debug"
[metrics]
address = "127.0.0.1:1338"
[plugins.linux]
runtime = "runc"
shim_debug = true
[plugins.cri.containerd.runtimes.runsc]
runtime_type = "io.containerd.runsc.v1"
/etc/containerd/config.toml (gvisor-containerd-shim)
subreaper = true
oom_score = -999
disabled_plugins = ["restart"]
[debug]
level = "debug"
[metrics]
address = "127.0.0.1:1338"
[plugins.linux]
runtime = "runc"
shim_debug = true
shim = "/usr/local/bin/gvisor-containerd-shim"
[plugins.cri.containerd.runtimes.runsc]
runtime_type = "io.containerd.runtime.v1.linux"
runtime_engine = "/usr/local/bin/runsc"
runtime_root = "/run/containerd/runsc"
The metrics server yaml is based on https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml with the following args
....
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.6
imagePullPolicy: IfNotPresent
args:
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
- --cert-dir=/tmp
- --secure-port=4443
....
The current deployment has the below resources section
resources:
limits:
cpu: 500m
requests:
cpu: 200m
gVisor currently only reports memory and Pids on a per Pod basis. See: https://github.com/google/gvisor/blob/add40fd/runsc/boot/events.go#L62-L68
We are planning to export more stats and the issue for tracking that work is here: https://gvisor.dev/issue/172