How to estimate memory requirement for submitting a job to a cluster running SGE?

1.8k Views Asked by At

I am trying to submit a job to a cluster [running Sun Grid Engine (SGE)]. The job kept being terminated with the following report:

Job 780603 (temp_new) Aborted
 Exit Status      = 137
 Signal           = KILL
 User             = heaswara
 Queue            = [email protected]
 Host             = comp-0-8.local
 Start Time       = 08/24/2013 13:49:05
 End Time         = 08/24/2013 16:26:38
 CPU              = 02:46:38
 Max vmem         = 12.055G
failed assumedly after job because:
job 780603.1 died through signal KILL (9)

The resource requirements I had set were:

#$ -l mem_free=10G
#$ -l h_vmem=12G

mem_free is the amount of memory my job requires and h_vmem is the is the upper bound on the amount of memory the job is allowed to use. I wonder my job is being terminated because it requires more than that threshold (12G). Is there a way to estimate how much memory will be required for my operation? I am trying to figure out what should be the upper bound. Thanks in advance.

1

There are 1 best solutions below

0
On

It depends on the nature of the job itself. If you know anything about the program that is being run (i.e., you wrote it), you should be able to make an estimate on how much memory it is going to want. If not, your only recourse is to run it without the limit and see how much it actually uses.

I have a bunch of FPGA build and simulation jobs that I run. After each job, I track how much memory was actually used. I can use this historical information to make an estimate on how much it might use in the future (I pad by 10% in case there are some weird changes in the source). I still have to redo the calculations whenever the vendor delivers a new version of the tools, though, as quite often the memory footprint changes dramatically.