Providing one job for Node on StarCluster / SunGridEngine (SGE)

When jobs are qsub

in a StarCluster / SGE, is there an easy way to ensure that each node receives no more than one job at a time? I am having problems where multiple jobs end on the same node, resulting in memory (OOM) issues.

I tried to use -l cpu=8

but I think it doesn't check the number of USED cores only the number of cores on the box itself.

I also tried -l slots=8

, but then I get:

Unable to run job: "job" denied: use parallel environments instead of requesting slots explicitly.

      

+3


source to share


3 answers


In your config file (.starcluster / config) add this section:



[plugin sge]
setup_class = starcluster.plugins.sge.SGEPlugin
slots_per_host = 1

      

+4


source


Depends heavily on how the cluster resources are configured i.e. memory limits etc. However, it's one thing to request a lot of memory for each job:

-l h_vmem=xxG

      



This will have the side effect of excluding other jobs from the node, since most of the memory of that node has already been requested by another previously started job.

Just make sure the requested memory does not exceed the allowed limit for node. You can see if it gets around this limit by checking the output qstat -j <jobid>

for errors.

+1


source


I accomplished this by setting the number of slots on each of my nodes to 1 using: qconf -aattr queue slots "[nodeXXX=1]" all.q

-1


source







All Articles