“module load” in the job script
To make job submission easier and more fault-tolerant for you, Slurm by default passes on all the environment (variables) and all loaded modules of the (login) session you submit the job from.
Thus, for a better reproducibility it is recommended to begin each job script with
module purge, followed by only those specific
module load … lines necessary for this job. Submitted that way, the job's main program will run with only the required and desired software (versions).
This is especially important if you use for example
module initadd to load certain modules from
~/.bashrc (because you need them time and again in each login session).
Define your own module “collections”
When you have a set of modules optimized for a class of jobs, you can define them as a “collection”, easily to be restored in one line in your job scripts.
After loading your elaborated set of modules with “
module load mX mY mZ …” (optionally with “
… mX/versionX mY/versionY mZ/versionZ …”), save it as a “collection” using “
module save <myCollectionName>”.
In your job scripts, you can then load and activate this “collection” simply with
module restore <myCollectionName>
LMod puts each of your “collections” into a text file
$HOME/.lmod.d/<myCollectionName>, where you can also inspect the exact settings of them.
A list of all your “collections” appears with “
Archive decompression in /work/scratch--Attention: automatic file cleanup
The extraction of archives (e.g.
*.tar) often keeps the modification timestamps of all files. If the modification time of the decompressed file is too old, e.g. older than 8 weeks, the freshly extracted files may be deleted by the automatic cleaning policy of the scratch area (run daily).
To avoid such cleaning, you can often use an additional tool parameter, e.g. for
tar you can use the parameter
-m. Alternatively you can use the
touch command to generate an updated modification time attribute.
Attention: starting April 18th 2017, the scratch cleaning cycle will be changed from the 'modification time' to being based on 'creation time' for all files. After this change, there is no need for a modification time update (via additional archive parameters or
touch) any more. In other words (after the change), the update of the modification time of a file is pointless and will no longer prevent your file(s) from being deleted.
Missing Slurm support at MPI applications
Many applications have problems to use the correct number of cores within the batch system. This might be a problem of missing
Slurm support. In general those applications use their own MPI versions and have to be supported explicitly by the right number of cores and by the
First you have to generate a current
Hostfile The following line replaces the usual call: “
srun hostname > hostfile.$SLURM_JOB_ID mpirun -n 64 -hostfile hostfile.$SLURM_JOB_ID <MPI-Program>
The first line (above) generates the
Hostfile, additionally the second line gives MPI the number of planned cores (here 64) and the name of the
Job details at the end
After your job has finished, the following command reports about CPU and memory efficiency of the job:
Even more details will be shown by the following command.
sacct -l -j <JobID> tuda-seff <JobID>
Expiry date of your user account
To see the expiry date of your own user account, use the script
Your user account's validity term is independent of any projects' term or validity you might be associated with.
File transfer to and from the Lichtenberg HPC
We recommend the following tools:
As you can log in via
ssh to the login nodes, you can also use SSH's
scp tool to copy files and directories from or to the Lichtenberg.
In case of (large) text/ASCII files, you should use the optional compression (-C) built into the SSH protocol, in order to save network bandwidth and to possibly speed up your transfers.
Omit compression when copying already compressed data like JPG images or videos in modern container formats (mp4, OGG).
tuid@hla0003:~ $ scp -Cpr myResultDir mylocalworkstation.inst.tu-darmstadt.de:/path/to/my/local/resultdir
Fault tolerance: none (when interrupted,
scp will transfer everything afresh, regardless what's already in the destination).
Some cases, ie. repeating transfers, are less suitable for
Examples: “I need my calculations' results also on my local workstation's hard disk for analysis with graphical tools” or “My local experiment's raw data need to hop to the HPC for analysis as soon as it is generated”.
As soon as you have to keep (one of) your Lichtenberg directories “in sync” with one on your institute's (local) infrastructure, running
scp repeatedly would be inefficient, as it is not aware of “changes” and would blindly copy the same files over and over again.
rsync can step in. Like
scp, it is a command line tool, transferring files from any (remote) “SRC” to any other (remote) “DEST”ination. In contrast to
scp however, it has a notion of “changes” and can find out whether a file in “SRC” has been changed and needs to be transferred at all. New as well as small files will simply be transmitted, for large files however, rsync will transfer only their changed blocks (safeguarded by checksums).
In essence: unchanged files are not transferred again, new and changed files will, but for large files, only their changed portions (delta) will be transferred.
tuid@hla0003:~ $ rsync -aH myResultDir mylocalworkstation.inst.tu-darmstadt.de:/path/to/my/local/resultdir
Fault tolerance: partly (when interrupted,
rsync will transfer only what is missing or not complete in the destination).
rsync are “one way” tools only! If--between transfers--a file is changed in “DEST”, the next transfer will overwrite it with the (older) version from “SRC”.
Not available on the Lichtenberg:
FTP(S), sFTP, rcp and other older, clear-text protocols.