SGE h_vmem vs java -Xmx -Xms

Question

SGE h_vmem vs java -Xmx -Xms

1.7k Views Asked by Aphoid At 28 November 2025 at 21:05

We have a couple of SGE clusters running various versions of RHEL at my work and we're testing a new one with a newer Redhat, all . On the old cluster ("Centos release 5.4"), I'm able to submit a job like the following one and it runs fine:

echo "java -Xms8G -Xmx8G -jar blah.jar ..." |qsub ... -l h_vmem=10G,virtual_free=10G ...

On the new cluster "CentOS release 6.2 (Final)", a job with those parameters fails due to running out of memory, and I have to change the h_vmem to h_vmem=17G in order for it to succeed. The new nodes have about 3x the RAM of the old node and in testing I'm only putting in a couple of jobs at a time.

On the old cluster, I'd set the -Xms/Xms to be N, I could use N+1 or so for the h_vmem. On the new cluster, I seem to be crashing unless I set h_vmem to be 2N+1.

I wrote a tiny perl script that all it does is progressively use consume more memory and periodically print out the memory used until it crashes or it reaches a limit. The h_vmem parameter makes it crash at the expected memory usage.

I've tried multiple versions of the JVM (1.6 and 1.7). If I omit the h_vmem, it works, but then things are riskier to run.

I have googled where others have seen similar issues, but no resolutions found.

Original Q&A

There are 1 best solutions below

**Aphoid** · Accepted Answer

The problem here appears to be an issue with the combination of the following factors:

The old cluster was RHEL5, and the new RHEL6
RHEL6 includes an update to glibc that changes the way MALLOC reports memory usage of multi-threaded programs.
the JVM includes a Multi-threaded garbage collector by default

To fix the problem I've used a combination of the following:

Export the MALLOC_ARENA_MAX environment variable to a small number (1-10) e.g. in the job script. I.e. include something like: export MALLOC_ARENA_MAX=1
Moderately increased the SGE memory requests by 10% or so
Explicitly set the number of java GC threads to a low number by using java -XX:ParallelGCThreads=1 ...
Increased the SGE thread requests. E.g. qsub -pe pthreads 2

Note that it's unclear that setting the MALLOC_ARENA_MAX all the way down to 1 is the right number, but low numbers seem to work well from my testing.

Here are the links that lead me to these conclusions:

https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en

What would cause a java process to greatly exceed the Xmx or Xss limit?

http://siddhesh.in/journal/2012/10/24/malloc-per-thread-arenas-in-glibc/

SGE h_vmem vs java -Xmx -Xms

There are 1 best solutions below

Related Questions in JAVA

Related Questions in MEMORY

Related Questions in CLUSTER-COMPUTING

Related Questions in SUNGRIDENGINE

Trending Questions

Popular # Hahtags

Popular Questions