We have a Go program (Go version 1.19 run in k8s pod, the RSS for the process show on node (ps / top) is much lower than the value shown in cgroup memory.stat.
This is the memory.stat in cgroup, the RSS is 7860129792, and rss_huge is 5066719232.
$ cat memory.stat
cache 547885056
rss 7860129792 <-- the rss in cgroup is much higher than the value in "ps -aux"
rss_huge 5066719232 <-- notice that there is also high rss_huge
shmem 0
mapped_file 0
dirty 20480
writeback 0
swap 0
pgpgin 450943252
pgpgout 450125090
pgfault 1097413913
pgmajfault 0
inactive_anon 0
active_anon 7859318784
inactive_file 546922496
active_file 962560
unevictable 0
hierarchical_memory_limit 12884901888
hierarchical_memsw_limit 12884901888
total_cache 547885056
total_rss 7860129792
total_rss_huge 5066719232
total_shmem 0
total_mapped_file 0
total_dirty 20480
total_writeback 0
total_swap 0
total_pgpgin 450943252
total_pgpgout 450125090
total_pgfault 1097413913
total_pgmajfault 0
total_inactive_anon 0
total_active_anon 7859318784
total_inactive_file 546922496
total_active_file 962560
total_unevictable 0
docker stats shows almost the same as cgroup.
$docker stats c39bc01d525e
CONTAINER ID CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
c39bc01d525e 49.27% 7.88GiB / 12GiB 65.67% 0B / 0B 0B / 24.6kB 106
However, there are 3 corresponding processes managed by this cgroup. The major process is pid 496687, its RSS is only 5205340, much lower than the one in cgroup and docker stats.
$ cat cgroup.procs
496644
496687
496688
$ ps -aux | grep -E "496644|496687|496688"
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 496644 0.0 0.0 1604 1464 ? Ss Oct28 0:00 sh ./bin/start.sh
root 496687 26.5 0.4 6466348 5205340 ? Sl Oct28 7271:55 /go/release/bin/golang-app
root 496688 0.0 0.0 1588 608 ? S Oct28 0:31 tail -f /dev/null
I also check the smaps for more details, the sum value of Rss / AnonHugePages / Size shows as below, the value is closed to the one show in ps, and still quite lower than the one in cgoup memeory.stat
sum for Rss:
$ cat /proc/496687/smaps | grep Rss | awk -F':' '{print $2 }' | awk 'BEGIN {sum=0} {sum+=$1} END {print sum}'
4645704
sum for AnonHugePages:
$ cat /proc/496687/smaps | grep AnonHugePages | awk -F':' '{print $2 }' | awk 'BEGIN {sum=0} {sum+=$1} END {print sum}'
524288
sum for Size:
$ cat /proc/496687/smaps | grep -E "^Size:" | awk -F':' '{print $2 }' | awk 'BEGIN {sum=0} {sum+=$1} END {print sum}'
6466352
Here is the setting for THP:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
$ cat /sys/kernel/mm/transparent_hugepage/defrag
always defer defer+madvise [madvise] never
$ cat /sys/kernel/mm/transparent_hugepage/shmem_enabled
always within_size advise [never] deny force
and the system info as below, we also turn off the swap so there is no swap cache:
$uname -a
Linux 4.14.15-1.el7.elrepo.x86_64 #1 SMP Tue Jan 23 20:28:26 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
$cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
So what may cause the RSS in ps / top much lower than the one in cgroup memory.stat?
As per the doc in https://kernel.org/doc/Documentation/cgroup-v1/memory.txt, the rss in cgroup memroy.stat includes "transparent hugepages", is THP also be counted into the RSS show in ps / top?
If it's caused by THP, what's its mechanism for impacting the memory accounting?