Partitions/Queues and Time Limits
In most cases, you do not need to specify a slurm partition, because an advanced automatic job submission mechanism is implemented to simplify this process.
For special cases (e.g. lectures and courses) you may need to specify an account, a reservation or a partition in your job scripts. For this, you will get additional details separately.
Depending on the maximum runtime of a job (-t or --time), jobs are assigned to a suitable partition (short, deflt, long). Partitions for jobs with a longer runtime have less hardware resources assigned to them, so their queueing/pending time will likely be longer.
Configuration of batch jobs for certain hardware
By default, jobs will be dispatched to any compute node(s) of the cluster, ie. to nodes of all phases (expansion stages) and types.
For special cases like programs requiring special hardware or node types, you need to specify the corresponding resource requirements. The most common distinction is by CPU architecture and by , but you can also specify a particular expansion stage or even a section, as listed in the following table. accelerator type
All other resource requirements like projected runtime and memory consumption will be adequately attributed to suitable node types and sections.
Expansion Stage / CPU Type | |||
---|---|---|---|
Resource | Section | Node Hostnames | Details |
i01 | all |
mpsc mpqc gvqc gaqc gaoc
|
LB 2 phase I |
i02 | all |
mpsd mpqd mpzd ghqd gpqd gaod
|
LB 2 phase II |
avx512 | MPI |
mpsc mpsd
|
MPI section, LB 2 phase I+II |
ACC |
gvqc gaqc ghqd
|
ACC section, LB 2 phase I+II | |
MEM |
mpqc mpqd mpzd
|
MEM section, LB 2 phase I+II | |
avx2 (or dgx) | ACC |
gaoc
|
ACC section, LB 2 phase I, DGX A100 |
_______________________________________________ |
|||
Accelerator Type (selected by “Generic Resources” instead of by “ |
|||
GRes | Accelerator | Node Hostnames | Details |
--gres=gpu | Nvidia (all) |
gvqc gaqc ghqd
|
ACC section (all) |
--gres=gpu:v100 | Nvidia Volta 100 |
gvqc
|
ACC section, LB 2 phase I |
--gres=gpu:a100 | Nvidia Ampere 100 |
gaqc
|
ACC section, LB 2 phase I |
--gres=gpu:h100 | Nvidia Hopper 100 |
ghqd
|
ACC section, LB 2 phase II |
--gres=gpu:pvc128g |
Intel Data Center GPU Max 1550 “Ponte Vecchio” |
gpqd
|
ACC section, LB 2 phase II still experimental |
_______________________________________________ |
|||
Sections | |||
Resource | Section Name | Node Hostnames | Details |
mpi | MPI |
mpsc
|
MPI section (all) |
mpsd
|
|||
mem1536g | MEM |
mpqc
|
MEM section, LB 2 phase I |
mem2048g |
mpqd
|
MEM section, LB 2 phase II | |
mem6144g |
mpzd
|
All above special “features” (except for acc/GPUs) can be requested with the parameter -C (“constraint”).
It can either be specified directly on the sbatch command line: “sbatch -C resource myJobScript
” (not recommended), or in batch/job scripts as additional pragma (recommended):
#SBATCH -C resource
Several features/constraints can be combined with either &
(logical AND) or |
(logical OR) – see examples down below.
However, GPU accelerators are no longer requested just by feature, but by GRes:
--gres=class:type:# accelerator specification, eg. GPUs
(if not specified, the defaults are: type=any and #=1)
--gres=gpu
requests 1 of any GPU accelerator cards (not recommended--can be either Nvidia or PVC)--gres=gpu:v100
requests 1 NVidia “Volta 100” card--gres=gpu:a100:3
requests 3 NVidia “Ampere 100” cards--gres=gpu:pvc128g:2
requests 2 Intel “Ponte Vecchio” GPUs with 128 GByte G-RAM ( (opens in new tab) ) PVC usage info
To have your job scripts (and programs) adapt automatically to the amount of (requested) GPUs, you can use the variable $SLURM_GPUS_ON_NODE
wherever your programs expect the number of GPUs to use, ie.
“myCUDAprogram --num-devices=$SLURM_GPUS_ON_NODE
”.
If you need more than one GPU node for distributed Machine/Deep Learning (eg. using “horovod
”), the job needs to request several GPU nodes explicitly using -N #
(with # = 2-8). Consequently, the number of tasks requested with -n #
needs to be equal or higher than the number of nodes.
Since “GRes” are per node, you should not exceed --gres=gpu:4
(except when targeting the
), even when using several 4-GPU-nodes. DGX with 8 GPUs
Examples
-C avx512
-C "avx512&mem1536g"
-C avx512
--gres=gpu:v100:2