Due to the login nodes facing the public (and sometimes evil) internet, we have to install (security) updates from time to time. This will happen on short notice (30 minutes). Thus, don't expect all login nodes to be available 24h/7d.
To use the cluster, it is not sufficient to simply start your program on a login node!
The login nodes are not for “productive” or long-running calculations!
Used by all users of the HPC, the login nodes are intended to be used only for
- job preparation and submission
scp'ing data in and out of the cluster
- short test runs of your program (≤ 30 minutes)
- debugging your software
- job status checking
While test-driving your software on a login node, check its current CPU load with “
top” or “
uptime”, and reduce your impact by using less cores/threads/processes or at least by using “
From a login node, your
(usually with “ productive calculations need to be submitted as batch jobs into the queue
sbatch”). For that, you need to specify your required resources per job (e.g. amount of main memory, number of nodes (and tasks), maximum runtime).
Batch system Slurm
The arbitration, dispatching and processing of all user jobs on the cluster is organized with the Slurm batch system. Slurm calculates when and where a given job will be started, considering all jobs' resource requirements, workload of the system, waiting times of the job and the priority of the associated project.
When eligible to be run, the user jobs are dispatched to one (or more) compute nodes and started there by Slurm.
The batch system expects a batch script for each job (array), which contains
- the resource requirements of the job in form of
#SBATCH …pragmas and
- the actual commands and programs you want to be run in your job.
The batch script is a plain text file (with !) and can either be created on your local PC and then be transferred to the login node. In Windows, use “Notepad++” and switch to UNIX (LF) in “Edit” – “Line feed format” before saving the script. UNIX line feeds
Or you can create it with UNIX editors on the login node itself and avoid the fuss with improper line feeds.