Our Mailinglist
Our mailing list [HPC-Nutzer] is a convenient way of being notified about . outages or failures
The easiest way of subscribing to or unsubscribing from is to send command mails to it.
Subscribe
From the mail address you want to receive our HPC news with, send a mail containing the command “subscribe
” (in subject and/or mail body) to hpc-nutzer-request@lists.tu-….
The list server replies with an “opt-in” mail to make sure it was in fact you sending the subscription request. After confirming the request, you will receive upcoming HPC news per mail.
Unsubscribe
From the mail address you are currently receiving our HPC news with, send a mail containing the command “unsubscribe
” (in subject and/or mail body) to hpc-nutzer-request@lists.tu-….
After receiving a last confirmation mail of the closure of your subscription, you will no longer receive HPC news via mail.
Expiry date of your user account
To see the expiry date of your own user account, use the script /shared/bin/account_expire
.
Your user account's validity term is independent of any projects' term or validity you might be associated with.
To list your current membership in (active) HPC projects, you can use
- the command “
cat ~/.project
” (static view, updated nightly) - the command “
member
” without any parameters (dynamic / immediate view without update delay)
Expiration
As projects can have several users/members, and a given user can be member of several projects, the validity terms of HPC user accounts and HPC projects are completely independent of each other. Both can expire (run out) at different dates, and extending one does not imply extending the other.
Automatic loading of your modules at login
If you always want certain modules to be loaded automatically at login time, edit your $HOME/.bashrc_profile
, adding the lines
if [ -n “${PS1}” ] ; then
# interactive shell, do output only after that check:
module load <module1>/<v1> <module2>/<v2> …
fi
at the end.
After next login, these modules will be loaded as soon as the shell % prompt appears.
How best to “module load” in the job script
To make job submission easier and more fault-tolerant for you, Slurm by default passes on all the environment (variables) and all loaded modules of the (login) session you submit the job from.
Thus, for better reproducibility it is recommended to begin each job script with module purge
, followed by only those module load …
lines really necessary for this job. Submitted that way, the job's main program will run with only the required and desired software (versions).
This is especially important if you use for example module initadd
to load certain modules from ~/.bashrc
(because you need them time and again in each login session).
Define your own module “collections”
When you have a set of modules optimized for a class of jobs, you can define them as a “collection”, easily to be restored by just one line in your job scripts.
After loading your elaborated set of modules with “module load mX mY mZ …
” (optionally with “… mX/versionX mY/versionY mZ/versionZ …
”), save it as a “collection” using “module save <myCollectionName>
”.
In your job scripts, you can then load and activate this “collection” simply with
module purge
module restore <myCollectionName>
LMod puts each of your “collections” into a text file $HOME/.lmod.d/<myCollectionName>
, where you can also inspect the exact settings of them.
A list of all your “collections” appears with “module savelist
”.
Archive decompression in /work/scratch--Attention: automatic file cleanup
The extraction of archives (e.g. *.zip
, *.tar
) often keeps the “modification” timestamps of all files.
If you extract an archive to your scratch area /work/scratch/<TU-ID>
and are any of the decompressed files older than 8 weeks, the freshly extracted files may be deleted by the nightly automatic cleaning policy of the scratch area.
To avoid your freshly extracted files to be deleted next night, you can use additional parameters, e.g. for tar
the -m
switch. Alternatively, you can use the touch
command on these files, to update their modification timestamp.
MPI applications missing Slurm support
Some MPI applications have problems to use the correct number of cores when run as a batch job, maybe due to missing Slurm
support. Such applications often use their own (built-in) MPI version and need additional guidance by explicitly specifying the right number of cores and by providing a so-called hostfile
.
First you have to generate a suitable hostfile
by asking “srun
” to tell the compute nodes assigned to your job. The following two lines replace the usual call “mpirun <MPI-Programm>
”:
srun hostname > hostfile.$SLURM_JOB_ID
mpirun -n $SLURM_NPROCS -hostfile hostfile.$SLURM_JOB_ID <MPI program>
Separate, Enriched Python Environments
Python “conda” environments could interfere with and do not harmonize well with python modules from . our module system
Thus we strongly recommend using python's “”, if you need specific python packages not available in the module system. virtual environments
To prepare your own “vEnv” named “myenv
”:
ml gcc/8 python/3.10 # load a suitable compiler & python version
mkdir test ; cd test
python -m venv myenv # create a new, empty vEnv named "myenv"
source myenv/bin/activate # and activate it
pip install --upgrade pip # Now, you can use "pip" (without "--user")...
pip install MyPyPkg1 MyPyPkg2 ... # ... to install your missing python packages
deactivate
You can of course freely name your vEnv to your liking. And you may install it also into a project (or group) directory, to make your vEnv available to your coworkers.
After this preparation (necessary only once on a login node), your virtual environment “myenv
” is ready to be used in jobs:
ml gcc/8 python/3.10
cd test
source myenv/bin/activate
python myScript ... # which requires MyPyPkg1+2+...
deactivate
Details of finished jobs
After your job has finished, the following command reports about CPU and memory efficiency of the job:
seff <JobID>
Even more details will be shown by the following commands:
sacct -l -j <JobID>
tuda-seff <JobID>
File transfer to and from the Lichtenberg HPC
Before and after calculations, your data needs to get on and your results to get off the . Lichtenberg filesystems
Use the for your transfers, as these have high bandwidth network ports also to the TU campus network (we do not have any other special input/output nodes). login nodes
We recommend the following tools:
One-off: scp
(or sftp
)
As you can log in via ssh
to the login nodes, you can also use SSH's scp
(and sftp
) tools to copy files and directories from or to the Lichtenberg.
In case of (large) text/ASCII files, you should use the optional compression (-C) built into the SSH protocol, in order to save network bandwidth and to possibly speed up your transfers.
Omit compression when copying already compressed data like JPG images or videos in modern container formats (mp4, OGG).
Example:
tu-id@logc0004:~ $ scp -Cpr myResultDir mylocalworkstation.inst.tu-darmstadt.de:/path/to/my/local/resultdir
Details: man scp
.
Fault tolerance: none (when interrupted, scp
will transfer everything afresh, regardless what's already in the destination).
Repeatedly: rsync
Some cases, ie. repeating transfers, are less suitable for scp
.
Examples: “I need my calculations' results also on my local workstation's hard disk for analysis with graphical tools” or “My local experiment's raw data need to hop to the HPC for analysis as soon as it is generated”.
As soon as you have to keep (one of) your Lichtenberg directories “in sync” with one on your institute's (local) infrastructure, running scp
repeatedly would be inefficient, as it is not aware of “changes” and would blindly copy the same files over and over again.
That's where rsync
can step in. Like scp
, it is a command line tool, transferring files from any (remote) “SRC” to any other (remote) “DEST”ination. In contrast to scp
however, it has a notion of “changes” and can find out whether a file in “SRC” has been changed and needs to be transferred at all. New as well as small files will simply be transmitted, for large files however, rsync will transfer only their changed blocks (safeguarded by checksums).
While the initial “rsync
”hronisation of a larger directory tree won't be much faster than with “scp -Cr
”, any subsequent synchronisation will be finished much more quickly, as only deltas (the changes) are transferred.
In essence: unchanged files are never transferred again, new and changed files will, but for large files, only their changed portions (delta) will be transferred.
Example:
tu-id@logc0004:~ $ rsync -aH myResultDir mylocalworkstation.inst.tu-darmstadt.de:/path/to/my/local/resultdir
Details: man rsync
.
Fault tolerance: partly (when interrupted, rsync
will transfer only what is missing or not complete in the destination).
Remember: both scp
and rsync
are “one way” tools only! If--between transfers--a file is changed in “DEST”, the next transfer will overwrite it with the (older) version from “SRC”.
If you want to go “bidirectional”, you may try syncthing
or unison
.
Problems
If you can log in to a shell (interactively), yet any file transfer fails, consult . our FAQ “I can't upload files!”
Not available on the Lichtenberg:
HTTP(S), SMB (CIFS), rcp
and other older, unencrypted protocols.