Computing resources

Partitions/Queues and Time Limits

In most cases, you do not need to specify a slurm partition, because an advanced automatic job submission mechanism is implemented to simplify this process.

^{For special cases (e.g. lectures and courses) you may need to specify an account, a reservation or a partition in your job scripts. For this, you will get additional details separately.}

Depending on the maximum runtime of a job (-t or --time), jobs are assigned to a suitable partition (short, deflt, long). Partitions for jobs with a longer runtime have less hardware resources assigned to them, so their queueing/pending time will likely be longer.

Job runtime requirement ^{(-t / --time=…)}	Partition Name		Nodes assigned
Job runtime requirement ^{(-t / --time=…)}	MPI	Accelerators
≦ 30'	`deflt_short`		1206 ^{all + 7 exclusive}
≦ 30'		`acc_short`	13 ^{all accelerator nodes}
> 30' ≦ 24h	`deflt`		1199 ^{7 less than all}
	`mem`		5
		`acc`	13 ^{all accelerator nodes}
> 24h ≦ 7d	`long`		607
	`mem_long`		2
		`acc_long`	9 ^{4 less than all accelerator nodes}
_{Jobs with a continuous runtime longer than 7d are only possible after coordination with the HPC team and with the use of a special reservation. However, there is a trick for using non-continuous runtime > 7d .}

Configuration of batch jobs for certain hardware

By default, jobs will be dispatched to any compute node(s) of the cluster, ie. to nodes of all phases (expansion stages) and types.

For special cases like programs requiring special hardware or node types, you need to specify the corresponding resource requirements. The most common distinction is by CPU architecture and by accelerator type , but you can also specify a particular expansion stage or even a section, as listed in the following table.

All other resource requirements like projected runtime and memory consumption will be adequately attributed to suitable node types and sections.

Expansion Stage / CPU Type
Resource	Section	Node Hostnames	Details
i01	all	`mpsc` `mpqc` `gvqc` `gaqc` `gaoc`	LB 2 phase I
i02	all	`mpsd` `mpqd` `mpzd` `ghqd` `gpqd` `gaod`	LB 2 phase II
avx512	MPI	`mpsc` `mpsd`	MPI section, LB 2 phase I+II
	ACC	`gvqc` `gaqc` `ghqd` `gaod`	ACC section, LB 2 phase I+II
	MEM	`mpqc` `mpqd` `mpzd`	MEM section, LB 2 phase I+II
avx2 (or dgx)	ACC	`gaoc`	ACC section, LB 2 phase I, DGX A100
_______________________________________________
Accelerator Type ^{(selected by “Generic Resources” instead of by “constraint/feature”)}
GRes	Accelerator	Node Hostnames	Details
--gres=gpu ^{(not recommended due to referencing all 3 different types)}	Nvidia + Intel + AMD (all)	`gvqc` `gaqc` `gaoc` `gaod` `ghqd` `gpqd`	ACC section (all)
--gres=gpu:v100	Nvidia Volta 100	`gvqc`	ACC section, LB 2 phase I
--gres=gpu:a100	Nvidia Ampere 100	`gaqc` `gaoc`	ACC section, LB 2 phase I
--gres=gpu:h100	Nvidia Hopper 100	`ghqd`	ACC section, LB 2 phase II
--gres=gpu:pvc128g	Intel Data Center GPU Max 1550 “Ponte Vecchio”	`gpqd`	ACC section, LB 2 phase II *usable, but under testing*
--gres=gpu:mi300x	AMD MI300X	`gaod`	ACC section, LB 2 phase II *usable, but under testing*
_______________________________________________
Sections
Resource	Section Name	Node Hostnames	Details
mpi	MPI	`mpsc`	MPI section (all)
mpi	MPI	`mpsd`	MPI section (all)
mem1536g	MEM	`mpqc`	MEM section, LB 2 phase I
mem2048g		`mpqd`	MEM section, LB 2 phase II
mem6144g		`mpzd`	MEM section, LB 2 phase II

All above special “features” (except for acc/GPUs) can be requested with the parameter -C (“constraint”).

It can either be specified directly on the sbatch command line: “sbatch -C feature myJobScript” (not recommended), or in batch/job scripts as additional pragma (recommended):

#SBATCH -C feature

Several features/constraints can be combined with either & (logical AND) or | (logical OR) – see examples down below.

However, GPU accelerators are no longer requested just by feature, but by GRes:

--gres=class:type:# accelerator specification, eg. GPUs
(if not specified, the defaults are: type=any and #=1)

--gres=gpu
^{requests 1 of any GPU accelerator cards (not recommended--can be either Nvidia or PVC or AMD)}
--gres=gpu:v100
^{requests 1 NVidia “Volta 100” card}
--gres=gpu:a100:3
^{requests 3 NVidia “Ampere 100” cards}
--gres=gpu:pvc128g:2
^{requests 2 Intel “Ponte Vecchio” GPUs with 128 GByte G-RAM (PVC usage info (opens in new tab)
)}
--gres=gpu:mi300x:6
^{requests 6 AMD MI300X GPUs with 192 GByte G-RAM each (MI300X usage info (opens in new tab)
)}

^{To have your job scripts (and programs) adapt automatically to the amount of (requested) GPUs, you can use the variable $SLURM_GPUS_ON_NODE wherever your programs expect the number of GPUs to use, ie.
“myCUDAprogram --num-devices=$SLURM_GPUS_ON_NODE”.}

Since you always get whole GPU cards allocated, there is no way of asking for a certain amount of tensor cores or shaders or G-RAM (GPU memory) in a GPU. When allocated, you are free to use each and every resource inside that allocated GPU card at will.

If you need more than one GPU compute node for distributed Machine/Deep Learning (eg. using “horovod”), the job needs to request several GPU nodes explicitly using -N # (with # = 2-8). Consequently, the number of tasks requested with -n # needs to be equal or higher than the number of nodes.

Since “GRes” are per node, you should not exceed --gres=gpu:4 (except when targeting the DGX with 8 GPUs or the AMD MI300X GPUs), even when using several 4-GPU-nodes.

Examples

-C avx512

requests nodes with CPUs sporting “Advanced Vector Extensions (512 bit)”

-C "avx512&mem1536g"

requests nodes with AVX512 instruction set AND 1.5 TByte RAM

-C avx512
--gres=gpu:v100:2

requests nodes with CPU architecture “avx512” and 2 GPUs of type “Volta 100”

Partitions/Queues and Time Limits

Configuration of batch jobs for certain hardware

Accelerators (GPUs) per GRes

Examples