Running into a 500-process limit on RHEL 6.10

119 Views Asked by At

I'm seeing a very unusual problem on a pair of RedHat 6.10 servers. We are in the process of updating these systems to RHEL 8, but that's not the current problem. What happens right now is that no matter what I try to do, the servers in question seem to be hitting a limit of roughly 500 processes. If I run ps -ef | wc -l from the root command line, I always get 498, 499, or 500. When I try to su to the JbossAdm user account (or any other account, for that matter) from root, I usually get the error:

su: cannot set user id: Resource temporarily unavailable

which I've looked up to indicate some resource limit is being hit. Sometimes the su does succeed, but not often. I've looked at /etc/security/limits.conf and the only defined limits are for the JbossAdm user account nproc of 80000 and setting the system-wide corefile size limit to 0. I've also checked /proc/sys/kernel/pid_max to confirm it is 32768. I've looked in /etc/security/limits.d/90-nproc.conf and the soft limit for nproc is set to 1024 for all users except root, which is set to unlimited.

So what I'm wondering is if anyone has ideas as to why we're seeing a 500-process limit, and how can we increase that limit? I'm in process of checking with our patching and server build teams as to what else has changed this past weekend, but I'm hoping to get inspiration for further troubleshooting from the community here while I gather this information.

1

There are 1 best solutions below

0
RagManX On

Turns out the 500-process limit was a red herring. Upon deeper inspection, we found in /etc/security/limits.d/99-jboss.conf that there was a 4096 nproc setting which we hadn't noticed while scouring files looking for process-limit settings. This limit was giving us a 4K thread limit, which is what we were hitting. When we bumped this from 4096 to 32768 and rebooted, everything started running fine. We confirmed with ps -Lu JbossAdm | wc -l and found that the system is now allowing more than 4K threads, and the user application is running as expected without the errors we were seeing before.