kstat, getting the number of lightweight processes running in a zone

802 Views Asked by At

SmartOS zones have a cap or maximum number of lightweight processes defined in the zone package. To check if my current settings make sense I would like to collect the total number of lightweight processes over time (from within the zone). I already have tools in place to collect kstat counters so I would prefer to use it. I have noticed that prstat provides a lwps count and any information on how prstat get this value might be helpful.

In the meantime I have been able to fetch the number of processes using 'caps:*:nprocs_zone_*:usage' or 'unix:0:system_misc:nproc' but nothing regarding lightweight processes.

To sum-up, I would like to know:

  • if there is any kstat counter representing the total number of lwps or per processes counters that I could sum to get this value.
  • if not, how is prstat getting the total lwps count?
  • is there any other way to get this value ? (Hopefully without parsing prstat output)
1

There are 1 best solutions below

0
On

prstat is querying the /proc filesystem for info, example of running truss against a prstat command shows:

open("/proc/25841/psinfo", O_RDONLY)            = 158

proc is well documented here, alternatively search for "oracle man pages section 4 File Formats proc".

Within each /proc/[pid] dir exists a /proc/[pid]/lwp/ directory. For example:

root@sol11:/proc/597/lwp# ls -l
total 35
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 1
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 10
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 11
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 12
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 13
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 14
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 15
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 16
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 17
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 18
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 19
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 2
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 20
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 21
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 22
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 23
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 24
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 26
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 27
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 28
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 29
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 3
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 30
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 31
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 33
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 34
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 35
dr-xr-xr-x   2 root     root         256 Jun  3 13:07 36
dr-xr-xr-x   2 root     root         256 Jun 16 11:16 37
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 4
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 5
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 6
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 7
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 8
dr-xr-xr-x   2 root     root         256 Jun  3 13:06 9

Running ps against the same process will read what each lwp is and present it back, this particular pid relates to the fmd service.

root@sol11:/proc/597/lwp# ps -Lp 597
  PID   LWP LNAME            TTY        LTIME CMD
  597     1 -                ?           0:00 fmd
  597     2 fmd_timerq_exec  ?           0:04 fmd
  597     3 fmd-self-diagnosis ?           0:00 fmd
  597     4 sysevent-transport ?           0:00 fmd
  597     5 door_xcreate_startf ?           0:00 fmd
  597     6 door_xcreate_startf ?           0:00 fmd
  597     7 subscriber_event_handler ?           0:01 fmd
  597     8 fmd_door_server  ?           0:00 fmd
  597     9 fmd_thread_start ?           0:00 fmd
  597    10 cpumem-retire    ?           0:00 fmd
  597    11 ses-log-transport ?           0:04 fmd
  597    12 ext-event-transport ?           0:00 fmd
  597    13 door_xcreate_startf ?           0:00 fmd
  597    14 door_xcreate_startf ?           0:00 fmd
  597    15 sas-cabling      ?           0:01 fmd
  597    16 io-retire        ?           0:00 fmd
  597    17 eft              ?           0:00 fmd
  597    18 endurance-transport ?           0:00 fmd
  597    19 fdd-msg          ?           0:00 fmd
  597    20 disk-transport   ?           0:00 fmd
  597    21 sensor-transport ?           0:04 fmd
  597    22 -                ?           0:00 <defunct>
  597    23 syslog-msgs      ?           0:00 fmd
  597    24 disk-diagnosis   ?           0:00 fmd
  597    26 zfs-retire       ?           0:00 fmd
  597    27 fru-monitor      ?           0:02 fmd
  597    28 -                ?           0:00 fmd
  597    29 software-response ?           0:00 fmd
  597    30 enum-transport   ?           0:00 fmd
  597    31 non-serviceable  ?           0:00 fmd
  597    33 fabric-xlate     ?           0:00 fmd
  597    34 software-diagnosis ?           0:00 fmd
  597    35 zfs-diagnosis    ?           0:00 fmd
  597    36 umem_update_thread ?           0:06 fmd
  597    37 fmd_door_server  ?           0:00 fmd

The following command should parse the total LWPs for you.

prstat -n 1,1 1 1 | nawk '/Total/ { print $4 }'

Run with zlogin -c to obtain details of LWPs in zones. You could probably obtain the same info with dtrace however I know not how to do this. Post back with your findings should you find out...

Should you want to delve deeper into LWP info, the -lL switches provide detailed info regarding LWPs, for example:

root@sol11:/etc# ps -lLp 597
 F S    UID   PID  PPID   LWP LNAME              C PRI NI     ADDR     SZ    WCHAN TTY        LTIME CMD
 0 S      0   597     1     1 -                  0  40 20        ?  25470        ? ?           0:00 fmd
 0 S      0   597     1     2 fmd_timerq_exec    0  40 20        ?  25470        ? ?           0:04 fmd
 0 S      0   597     1     3 fmd-self-diagnosis   0  40 20        ?  25470        ? ?           0:00 fmd
 0 S      0   597     1     4 sysevent-transport   0  40 20        ?  25470        ? ?           0:00 fmd
 0 S      0   597     1     5 door_xcreate_startf   0  40 20        ?  25470        ? ?           0:00 fmd
 0 S      0   597     1     6 door_xcreate_startf   0  40 20        ?  25470        ? ?           0:00 fmd
 0 S      0   597     1     7 subscriber_event_handler   0  40 20        ?  25470        ? ?           0:01 fmd

HTH.