Solaris : pmap reports a different virtual memory size than ps

807 Views Asked by At

I have a process running on Solaris (SunOS m1001 5.10 sun4v sparc) and was monitoring the total virtual memory used.

Periodically running ps showed that the VSZ was growing linearly over time with jumps of 80kbytes and that it keeps growing until it reaches the 4GB limit at which point it's out of address space and things start to fall apart.

while true; do ps -ef -o pid,vsz,rss|grep 27435 ; sleep 5; done > ps.txt

I suspected a memory leak and decided to further investigate with pmap. But pmap shows that VSZ is not growing at all but rather stays stable. Also all file maps, shared memory maps and heap kept the same size.

while true; do pmap -x 27435 |grep total; sleep 5; done > pmap.txt

My first question is: Why do ps and pmap produce a different VSZ for the same process?

I can imagine that heap sizes are calculated differently (e.g. heap usage vs highest heap pointer), so started thinking in the direction of heap fragmentation. I then used libumem and mdb to produce detailed reports about allocted memory at different times and noticed that there was absolutely no difference in allocated memory.

 mdb 27435 < $umem_cmds
 ::walk thread |::findstack !tee>>umemc-findstack.log
 ::umalog !tee>>umem-umalog.log
 ::umastat !tee>>umem-umastat.log
 ::umausers !tee>umem-umausers.log
 ::umem_cache !tee>>umem-umem_cache.log
 ::umem_log !tee>>umem-umem_log.log
 ::umem_status !tee>>umem-umem_status.log
 ::umem_malloc_dist !tee>>umem-umem_malloc_dist.log
 ::umem_malloc_info !tee>>umem-umem_malloc_info.log
 ::umem_verify !tee>>umem-umem_verify.log
 ::findleaks -dv !tee>>umem-findleaks.log
 ::vmem !tee>>umem-vmem.log
 *umem_oversize_arena::walk vmem_alloc | ::vmem_seg -v !tee>umem-    oversize.log
 *umem_default_arena::walk vmem_alloc | ::vmem_seg -v !tee>umem-default.log

So my second question is: what is the best way to figure out what is causing the growing VSZ reported by ps.

2

There are 2 best solutions below

0
On BEST ANSWER

I noticed that this question was still open and wanted to add how this story ended.

After a lot more digging I contacted customer support from Solari and send them a way to reproduce the problem. They confirmed that there was a bug in the kernel which caused this behavior.

Unfortunately I cannot confirm that they rolled out a patch, since I left the company I was working for back then since.

Thx, Jef

4
On

If you run your suspect process with LD_PRELOAD=libumem.so, then at the point where "it all falls apart" you could gcore it - and then run mdb over it with the umem dcmds such as ::findleaks -dv.

If you look at all the mappings listed in the pmap(1) output rather than just the totals for the process, you'll have a much better idea of where to look. The first thing I look for are the heap, anon and stack segments.