I'm trying to debug some performance issues on a RHEL8.3 server. The server is actually a Kubernetes worker nodes and hosts several Redis containers (PODs). These containers are doing a lot of network I/O (iptraf-ng reports about 500 kPPS and 1.5Gbps). The server is an high-end Dell server with 104 cpus and 10Gbps NICs.
The issue I'm trying to debug is related to soft IRQs. In short: despite my attempts to set IRQ affinity of the NIC on a specific range of dedicated CPUs, the utility "mpstat" is still reporting a lot of CPU spent in "soft%" on all the CPUs where the "redis-server" process is running (even if redis-server has been moved using taskset to a non-overlapping range of dedicated CPU cores).
For more details consider the attached screenshot redis_server_and_mpstat:
- the "redis-server" with PID 3592506 can run only on CPU 80 (taskset -pc 3592506 returns 80 only)
- as can be seen from the "mpstat" output, it's running close to 100%, with 25-28% of the time spent in "%soft" time
In the attempt to address this problem, I've been using the Mellanox IRQ affinity script (https://github.com/Mellanox/mlnx-tools/blob/master/ofed_scripts/set_irq_affinity.sh) to "move" all IRQs related to the NICs on a separate set of CPUs (namely CPUs 1,3,5,7,9,11,13,15,17 that belong to NUMA1) for both NICs (eno1np0, eno2np1) that compose the "bond0" bonded interface used by the server, see the screenshot set_irq_affinity. Moreover the "irqbalance" daemon has been stopped and disabled.
The result is that mpstat is now reporting a consistent CPU usage from CPUs 1,3,5,7,9,11,13,15,17 in "%soft" time, but at the same time the redis-server is still spending 25-28% of its time spent in "%soft" column (i.e. nothing has changed for redis-server). This pattern is repeated for all instances of "redis-server" running on that server (there's more than 1), while other CPUs having no redis-server scheduled, are 100% idle.
Finally in a different environment based on RHEL7.9 (kernel 3.10.0) and a non-containerized deployment of Redis, I see that, when running the "set_irq_affinity.sh" script to move IRQs away from Redis CPUs, Redis %soft column goes down to zero.
Can you help me to understand why running redis into a Kubernetes container (with kernel 4.18.0), the redis-server process will continue to spend a consistent amount of time in %soft handling, despite NIC IRQs having affinity on different CPUs ? Is it possible that the time the redis-server process spends in "soft IRQ" handling is due to the veth virtual ethernet device created by the containerization technology (in this case the Kubernetes CNI is Flannel, using all default settings) ?
Thanks