### Frequently Asked Questions – Access

Using the cluster is free of charge.

However, operating and maintaining the HPC is quite an expensive business nonetheless (power consumption, cooling, administration). The compute resources should thus be used responsibly, by designing your compute jobs as efficient as possible.

#### User Account

While logged in on one of the login nodes, you can use the command account_expire to see your user account's end-of-validity date.

You will get an automatic email reminder 4 weeks before expiry of your account, and another one on the day of expiry.

To extend your account, simply fill and sign the “Nutzungsantrag” again, and send it to us fully signed via office post (or via your local contact person).

That can be caused by two basic reasons: the login node is not answering at all (perhaps due to being down or network problems), or it answers but denies you access.

• Try another login node (the one you tried may be down)
• In case of “Permission denied” (the login node answers, but denies you access), even from different login nodes: try changing your TUID password, as this is sync'ed to the HLR a few seconds later.
• Read the (ssh) error message in its entirety. Sometimes it even explains how to fix the actual problem.
ssh -XC –4 <tu-id@>lcluster15.hrz.tu-darmstadt.de
ssh -XC –6 <tu-id@>lcluster17.hrz.tu-darmstadt.de
From time to time, we refine the allowed (ie. considered safe) list of ssh key ciphers and negotiation algorithms, and some older program (versions) might be unable to work with these.
• Have you made any recent changes in your login startup scripts .bashrc or .bash_profile? When failing, commands in these startup scripts can cause the bash to end prematurely, and that might look like you are denied access.
If only interactive logins are affected, you can try accessing the likely culprit with your scp program (BitVise / WinSCP / FileZilla), and either
or
• delete the troublemaker (and then restoring your last version from .snapshots/)

If nothing of the above works out: open a , always mentioning your TUID (no enrollment/matricle no) and preferably from your professional TUDa email address.

Avoid sending “screenshots” of your terminal window as pictures (jpg/png or the like). A simple copy&paste in text form of your login attempts and the resulting error message is sufficient.

If there isn't any explanatory error message, please use the “verbose” mode by running “ssh -vv …” and append the output to the ticket mail (again please not as screenshot/picture, but as text).

#### Projects

Your currently active projects / memberships are recorded daily in a file called .project in your HOME directory:

cat $HOME/.project (see also “Hints” about project application). • Director of the institute: Most departments are organized into institutes (Fachgebiete). If this does not apply to your organization, please insert the dean, or a person with staff responsibility for the main research group. • PI: The principal investigator is responsible for the scientific aspects of the project. This can be the director as well as a junior professor, or post doc. • Main researcher/ Project manager: In general, this is the person who does the main part of work in this project. The PM is responsible for the administrative aspects of the scientific project. He or she is also the “technical contact” the HRZ communicates with. • Additional researchers: All other researchers who can compute on this project account. This includes other PhD students as well as students, who are working for the project. The general project classes by amount of resources (Small, NHR-Normal and NHR-Large) are listed in our project class overview. Beside plain computational projects, you might also be member of some rather technical billing accounts (eg. courses/trainings). In general, we have the following naming pattern: • project<5#> This is a Small (or preparation) project • p<7#> This is a NHR project (regardless of whether -Normal or -Large) • special<5#> Technical projects for select groups • kurs<5#> Trainings, workshops and lectures with practical exercises on the Lichtenberg cluster (see Lectures and Workshops). In general, one main researcher (PhD or post-doc) owns a project, ie. is project manager. This main researcher or PM can decide to add others to his or her project, for instance bachelor or master students, or a colleague he or she is collaborating with on this project. All these coworkers need to have their own user account on the HLR before being added to a project. Beware: while sharing your project account is explicitly allowed, sharing your user account is strictly prohibited! In general, the Lichtenberg “local” or Small projects should be in the range and size of a PhD project, or to prepare NHR-Normal or -Large projects. For longer research terms and scientific endeavours, recurring follow-up projects are required. Nonetheless, the initial proposal should outline the whole scientific goal, not only your 1st year's targets. If the limit of a Small project is insufficient for your scientific challenge, apply for a NHR-Normal or NHR-Large project. The project manager (PM or PoC – Person of Contact) is responsible for applying and (after completion) for reporting on the project. He will be working with the HRZ for the (technical) reviews, and hand in the original of the signed proposal to the HRZ. Students can apply for a bachelor or master thesis' project. The proposal has to be signed by the PM and by the PI, who is required to be professor or post-doc. For a “Small” (or preparation) project, only the web form needs to be completed, printed out, signed and sent to the HRZ. This form mainly asks for technical details and a short abstract (150-300 words) of the scientific goals. For the larger NHR projects, refer to the JARDS portal for details. Small”/preparation projects can be submitted at any time and the proposals will be handled upon entry. For the larger NHR projects, .deadlines (if any) can be found on the JARDS portal. All projects are subjected to a technical review by the HRZ/HPC team. If a project proposal is not sufficiently clear (ie. in terms of job sizes and runtimes), we wil contact the PM and ask for clarification or modification. After the TR has been completed successfully, “Small” projects are started immediately. NHR-Normal and NHR-Large projects are then scrutinized scientifically by (external) HPC experts from the field of the project.. Based on these reviews, the steering committee (Resource Allocation Board) approves (or denies) the proposal and assigns the resources, either as requested or in reduced form. The maximum grant period for any given project is one year, regardless of the project's class. If you know that your project needs less than a year, we suggest to write your proposal accordingly. As the computational resources are allotted evenly over the granted time period, shorter projects get a greater resource share per month, resulting in higher priority per job. In well-reasoned cases, a project can be extended beyond one year, for one or two months. If your research project will take much longer than a year to complete, you will need to apply for follow-up projects every year. Like you do it with your coworkers, referencing substantial contributions to your research publications should include the computational time grants from the Lichtenberg cluster. Properly communicating them improves public understanding of how research funds for HPC are spent and how we are working to support your research. We thus kindly ask for an acknowledgment in all your publications arose out of or having used calculations on the Lichtenberg: This work is funded by the Federal Ministry of Education and Research (BMBF) and the state of Hesse as part of the NHR Program. If having been supported by the HKHLR, you could add: This work is funded by the Federal Ministry of Education and Research (BMBF) and the state of Hesse as part of the NHR Program. The authors would like to thank the Hessian Competence Center for High Performance Computing--funded by the Hessen State Ministry of Higher Education, Research and The Arts--for helpful advice. For all TU Biblio publications, the category „Hochleistungsrechner“ within the „Divisions” list (as a subcategory of „Hochschulrechenzentrum“) was added to TU Biblio. Please use this category for your research publications related to the Lichtenberg Cluster, as then your publication will automatically be listed here accordingly. ### Frequently Asked Questions – batch scheduling system #### Preparing Jobs The batch scheduler needs to know some minimal properties of a job to decide which nodes it should be started on. If for example you would not specify --mem-per-cpu=, a task requiring very large main memory might be scheduled to a node with too little RAM and would thus crash. To put it another way: with the resource requirements of all user jobs, the scheduler needs to play kind of “multidimensional tetris”. At least along the dimensions runtime, memory size and no. of CPU cores, the scheduler places your jobs as efficiently and as gap-free as possible into the cluster. (In the background, many more parameters are used.) These three properties of a job are thus the bare minimum to give the scheduler something to schedule with. If sbatch <jobscript> complains about missing (mandatory) parameters, even though all these seem to be defined using #SBATCH -… pragmas, this may be caused by Windows linefeeds, which will not be considered valid on UNIX/Linux. If you wrote your job script on your windows PC/laptop and transferred it with scp to a login node, simply transform it with dos2unix jobscriptfile to a valid UNIX/Linux text file If above errors remain even after that, check all minus/hyphen characters. Here, you could have introduced dashes (long) or em-dashes (even longer), which do not work as “begin of a parameter” sign. Before submitting jobs, you need to determine how many CPUs (= cores) you want (best) to use, how much main memory your scientific program will need and how long the calculating will take. If your scientific program is already used in your group for problems like yours, you can ask your colleagues about their lessons learned. If you start afresh with a new scientific program package or a new class of problems: prepare a comparably small test case (no more than 30 minutes runtime), and run it on one of the login nodes (with the desired number of cores) under the control of the UNIX “time” command as follows: /bin/time --format='MaxMem: %Mkb, WCT: %E' myProgram <testcase> After the run, you get for example • MaxMem: 942080kb, WCT: 1:16.00 on your STDERR channel. After dividing “MaxMem” by 1024 (to get MBytes), you can determine your #SBATCH --mem-per-cpu= for that test case as MaxMem in MByte----------------- (plus a safety margin)# of cores used Your #SBATCH -t d-hh:mm:ss is then the “WCT” from above (plus a safety margin). In our example and if you have used 4 cores:  942080 / 1024 / 4 =--mem-per-cpu=230 When you have run your test case with 2, 4, 8 and 16 CPU cores, you can roughly guess the scalability of your problem, and you can size your real job runs accordingly. In a short hierarchy: The HPC cluster consists of • compute nodes single, independent computers like your PC/Laptop (just more hardware and performance) A node consists of • two or more CPUs (central processing units, or processors), placed in a socket. CPUs are the “program executing” part of a node. A CPU consists of • several cores, which can be understood as distinct execution units inside a single CPU. The more cores, the more independent processes or execution threads can be run concurrently. Each core can either be used by • a process = task (MPI) or • a thread (“multi-threading”), eg. POSIX threads or most commonly OpenMP (Open MultiProcessing) A pure MPI application would start as many distinct processes=tasks as there are cores configured for it. All processes/tasks communicate with each other by means of MPI. Such applications can use one node, or can be distributed over several nodes, the MPI communication then being routed via Infiniband. A pure multi-threaded application starts one single process, and from that, it can use several or all cores of a node with separate, (almost) independent execution threads. Each thread will optimally be allocated to one core. Most recent programs use OpenMP (see $OMP_NUM_THREADS in the documentation of your application).

Such applications cannot be distributed across nodes, but could make use of all cores on a given node.

Hybrid applications mix both parallelization models, by running eg. as many processes = tasks as there are nodes available, and spawning as many threads as there are cores on each node.

.

Important in this context:

For historical reasons from the pre-multicore era, SLURM has parameters referring to CPUs (eg. --mem-per-cpu=).

Today, this means cores instead of CPUs! Even if that's confusing, the rule simply is to calculate “--mem-per-cpu” as if it was named “--mem-per-core”.

For running a lot of similar jobs, we strongly discourage from fiddling with shell script loops around sbatch / squeue. For any amount of jobs >30..50, use Slurm's Job Array feature instead.

Using job arrays not only relieves the Slurm scheduler from unnecessary overhead, but allows you to submit much more ArrayTasks than distinct jobs!

Example use cases are:

• the same program, the same parameters, but lots of different input files
• the same program, the same input file, but lots of different parameter sets
• a serial program (unable to utilize multiple cores [multi-threading] or even several nodes [MPI]), but a lot of input files to analyze, and none of the runs depends on results of any other, i.e. High-Throughput Computing

Rename the “numerous” parts of your job with consecutive numbering, eg. image1.png, image2.png or paramFile1.conf, paramFile2.conf etc.

Let's say you have 3124 sets, then set up a job script with

#SBATCH -a 1-3124myProgram image$SLURM_ARRAY_TASK_ID.png > image$SLURM_ARRAY_TASK_ID.png.out

and submit it via sbatch. Slurm will now start one job with 3124 ArrayTasks, each one reading its own input image and writing to its own output file.

If you need to limit the number of parallel running ArrayTasks, use

#SBATCH -a 1-3124%10

Slurm will then run at most 10 tasks concurrently.

Further details can be found in 'man sbatch' under “--array=”.

#### Pending Jobs

The priority values shown by slurm commands like “squeue” or “sprio” are always to be understood as relative to each other, and in relation to the current demand on the cluster. There is no absolute priority value or priority “threshold”, from which jobs will start to run unconditionally.

During light load (=demand) on cluster resources, a low priority value might be sufficient to get the jobs to run immediately (on free resources). On the other hand, even a very high priority value might not be sufficient, if cluster resources are scarce or completely occupied.

Since most cluster resources are dedicated to the default job runtime of 24 hours, you should always factor in a minimum pending time of one day…

With the command “squeue --start”, you can ask the scheduler for an estimate of when it deems your pending jobs startable.

Please be patient when getting back “N/A” for quite a while, as that is to be expected. Since the scheduler does not touch every job in every scheduling cycle, it might take its time to reach even this “educated guess” on your pending jobs.

In general, your jobs' time spent in PENDING depends not only on your jobs' priority value, but mainly on the total usage of the whole cluster. Hence, there is no 1:1 relationship between your jobs' priority and their prospective PENDING period.

On the Lichtenberg HPC, the scheduler dispatches the jobs in the so-called “Fair Share” mode: the more computing power you use (especially in excess of your monthly project budget), the lower will be your next jobs' priority.

However, this priority degradation has a half-life of roughly a fortnight, so your priority will recover over time.

Your best bet is thus to use your computing budget evenly over the project's total runtime (see 'csreport'). This renders your priority degradation to be quite moderate.

For a planned downtime, we tell the batch scheduler in advance when to end job execution. Based on your job's runtime statement (#SBATCH -t d-hh:mm:ss in the job script), the scheduler decides whether a given job will safely be finished before the downtime, and will start it.

Pending jobs not fitting in the time frame until the downtime will not be started, and simply remain pending.

All pending jobs in all queues will survive (planned) downtimes or outages, and will recommence being scheduled as usual, according to their priorities.

#### Running Jobs

Check whether all directories mentioned in your job script are in fact there and writable for you.

In particular, the directory specified with

#SBATCH -e /path/to/error/directory/%j.err

for the STDERR of your jobs needs to exist beforehand and must be writable for you.

SLURM ends the job immediately if it is unable to write i.e. the error file (due to a missing target directory).

Due to being a “chicken and egg” problem, a construct inside the job script like

#SBATCH -e /path/to/error/directory/%j.errmkdir -p /path/to/error/directory/

cannot work either, since for Slurm, the “mkdir” command is already part of the job. Thus, any of “mkdir”s potential output (STDOUT or STDERR) would have to be written to a directory which at begin of the job does not yet exist.

While you can load those modules right when logging in on the login node, since these are inherited by “sbatch myJobScript”, this in fact is not reliable. Instead, it renders your jobs dependent on what modules you have loaded in your login session.

We thus recommend to begin each job script with

module purgemodule load <each and every relevant module>myScientificProgram …

to have exactly those modules loaded which are needed, and not more.

This also makes sure your job is reproducible later on, independently of what modules were loaded in your login session at submit time.

This ususally is caused by nested calls to either srun or mpirun within the same job. The second or “inner” instance of srun/mpirun tries to allocate the same resources as the “outer” one already did, and thus cannot complete.

If you have

srun /path/to/myScientificProgram

in your job script, check whether “/path/to/myScientificProgram” in fact is an MPI-capable binary. Then, the above syntax is correct.

But if myScientificProgram turns out to be a script, calling srun or mpirun by itself, then remove the srun in front of myScientificProgram and run it directly.

Example of such error:

srun: Job XXX step creation temporarily disabled, retryingsrun: error: Unable to create step for job XXX: Job/step already completing or completedsrun: Job step aborted: Waiting up to 32 seconds for job step to finish.slurmstepd: error: *** STEP XXX.0 ON hpb0560 CANCELLED AT 2020-01-08T14:53:33 DUE TO TIME LIMIT ***slurmstepd: error: *** JOB XXX ON hpb0560 CANCELLED AT 2020-01-08T14:53:33 DUE TO TIME LIMIT ***

There is no magic by which Slurm could know the really important part of your job script. The only way for Slurm to detect success or failure is the exit code of your job script, not the real success or failure of any program or command within it.

The exit code of well-written programs is zero in case everything went well, and >0 if an error has occurred.

Imagine the following job script:

#!/bin/bash#SBATCH …myScientificProgram …

Here, the last command executed is in fact your scientific program, so the whole job script exits with the exit code of “myScientificProgram” as desired. Thus, Slurm will assign COMPLETED if “myScientificProgram” has had an exit code of 0, and will assign FAILED if not.

If you issue just one simple command after “myScientificProgram”, this will overwrite the exit code of “myScientificProgram” with its own:

#!/bin/bash#SBATCH …myScientificProgram …cp resultfile $HOME/jobresults/ Now, the “cp” command's exit code will be the whole job's exit code, since “cp” is the last command of the job script. If the “cp” command succeeds, Slurm will assign COMPLETED even though “myScientificProgram” might have failed – “cp”s success covers the failure of “myScientificProgram”. To avoid that, save the exit code of your important program before executing any additional commands: #!/bin/bash#SBATCH …myScientificProgram …EXITCODE=$?cp resultfile $HOME/jobresults//any/other/job/closure/cleanup/commands …exit$EXITCODE

Immediately after executing myScientificProgram, its exit code is saved to $EXITCODE, and as a last line now, your job script can re-set this exit code (the one of the real payload). That way, Slurm get the “real” exit code of “myScientificProgram”, not just the one of the command which happens to be the last one in your job script, and will set COMPLETED or FAILED appropriately. Only during runtime of your own job(s). In general, a direct login to the compute nodes is not possible. However, during execution of your job(s), you are entitled to login to the executing compute nodes (from a login node, not from the internet). For which nodes are running your jobs, see the squeue output's NODELIST. That is to run top or similar utilities, and in general to see the behaviour of your job(s) at first-hand. In the case of multi-node MPI jobs, the node list from the squeue output needs to be decomposed to get distinct host names. For example, the node list mpsc0[301,307,412-413] are in fact mpsc0301mpsc0307mpsc0412mpsc0413 to which you are all entitled to log in with “ssh mpsc0…” while these are executing your job. If your job ends while you are still logged into one of its compute nodes, you will be logged out automatically, ending up back on the login node. #### Miscellaneous Similar to our compute nodes, the login nodes are not installed the usual way on hard disks. Instead, they fetch an OS image from the network each reboot (thus, also after downtimes) and extract the OS “root” image into their main memory. That assures these nodes being in a clean, defined (and tested) condition after each reboot. Since “cron”- und “at” entries are stored in the system area being part of that OS image, these entries would not be permanent and are thus unreliable. To avoid knowledgeable users creating “cron” or “at” jobs nonetheless (and inherently trusting their function for eg. backup purposes), we have switched off “cron” and “at”. Since June 2020, we have established a new password synchronisation: as soon as you change your TUID password in the IDM system of the TU Darmstadt, the login password to the Lichtenberg HPC will be in sync with it. For new users, this is in effect from the get-go, ie. their first login to the Lichtenberg HPC will work with their central TUID password. Existing users will keep their current password (the one last set on the “HLR” tab of the former “ANDO” portal). As soon as they change their TUID password on the IDM portal, it will overwrite the last HLR password. The same holds true for guest TUIDs. In these directories, quotas are managed using the UNIX groups da_p<ProjID> or da_<Institutskürzel> (in the following symbolized as da_XXX). Files (and directories) not belonging to the pertaining group will be accounted on the creating user's personal quota. As persons/TUIDs intentionally have only small quota on those group/shared folders), such mis-assigned files will cause the dreaded “quota exceeded” errors. Directories and files somewhere below of /work/projects/ and /work/groups/: • need to have the right group membership of da_XXX (and may not belong to your TUID group) • directories need to have permissions as follows: drwxrws--- The “sticky” bit on group level cares for new files to be automatically assigned the group of the parent directory (not the group of the creting user) Wrong: drwx------ 35 tuid tuid 8192 Jun 17 23:19 /work/groups/…/myDir Right: drwxrws--- 35 tuid da_XXX 8192 Jun 17 23:19 /work/groups/…/myDir Solution: Change into the parent directory of the problematic one, and check its permissions as described above., using ls -ld myDir In case these are not correct and you are the owner:: chgrp -R da_<XXX> myDir chmod 3770 myDir In case you are not the owner, ask the owner to execute the above commands. From time to time, we will revise and edit this web page. Please send us your question via email to , and if question & answer are of general interest, we will amend this FAQ accordingly. ### Frequently Asked Questions – Software #### Installation You can list all installed programs and versions with the command module avail. The command module loads and unloads paths and environmental variables as well as it displays available software. A detailed description is available here. Our module system is built in form of a hierarchical tree with respect to compiler(s) and MPI version(s). Many (open source) software package thus don't appear in the first instance, and become available only until you load a (suitable) compiler (and a (suitable) MPI module, respectively. In case you didn't load one or the other yet, many packages don't show up in the output of “module avail”. If you seem to miss a required software, please try module spider <mySW> oder module spider | grep -i mySW oder module --show-hidden avail <mySW> before installing it yourself or opening a ticket. Please send us an email . If the requested program is of interest to several users or groups, we will install it in our module tree to make it available for all. Otherwise we will support you (to a certain extent) in attempting a local installation, eg. in your /home/ folder. #### Licenses No, you first have to check whether the software requires a license. In that case, you have to prove you have the rights to use (a sufficient amount of) it, for example if your institute/department contributes to the yearly costs of a TU license, or has purchased its own. Please read also the comments to licenses in this list. In general: everything possible is not everything allowed. It depends. In general, our modules for commercial software fetch their licenses from license servers of the TU Darmstadt. These licenses are dedicated exclusively for members of TU Darmstadt contributing to the license costs. Please send us an email to if you have license questions. We can support you in configuring your software to fetch license tokens from eg. your institute's license servers. In general not everything is allowed, what is possible. #### Runtime Issues Remove all unnecessary modules. A job script should always start with module purgemodule load <only modules really required for this job> to ensure a clean environment:. Remember: whenever you load modules while you are on a login node, any job submitted from this modified environment will inherit these modules' settings! Therefore, it is strongly recommended to use the above purging/loading statement in all job scripts. Next, scrutinize your program's runtime libraries (“shared objects”) with ldd -v /path/to/binary Your $LD_LIBRARY_PATH might contain an unwanted directory, causing your program to load wrong or outdated libraries, which in fact should rather be coming from the modules you have loaded.

Particularly, the infamous“Bus error” can be caused by non-matching arguments or return values between calling binary and called library, thus causing “unaligned” memory access and crashes.

A crashing program usually causes a memory dump of its process to be created (in Linux, a file called core.<PID> in the directory where the program was started).

Unfortunately, some user jobs repeatedly crashed in a loop, causing lots of coredumps being created on our cluster-wide GPFS filesystem. As this adversely affected its performance and availability, we had to switch off the creation of coredumps by default.

However, for you to debug your software, we didn't prohibit core dumps entirely, and thus writing them can be enabled again:

ulimit -Sc unlimited<my crashing program call>gdb /path/to/programBinary core.<PID>

If it is in fact the very same binary (and not only the same “program”), compare together

• the modules you have loaded before submitting the job:
module list
(because these are inherited to the job!)
env | sort > myEnv
• your respective \$LD_LIBRARY_PATH setting and the libraries effectively loaded at runtime:
ldd /path/to/same/binary

Yes, that's possible, by using the so-called “collection” feature of our module system LMod.

More details can be found in our Tips and Tricks section, and inside “man module”.

#### Machine Architecture

The Lichtenberg HPC runs only Linux, and thus will not run windows (or MacOS) executables natively.

Ask your scientific software vendor or provider for a native Linux version: if it is in fact a scientific application, there's a very good chance they have one…

Due to the unproportional administrative efforts (and the missing windows licenses), we are sorry to have to deny all requests like “virtual windows machines on the cluster” or to install WINE just like that.

Since the Lichtenberg HPC runs CentOS (a RedHat compatible distribution), the native application packaging format is RPM, not .deb.

Though there are ways to convert .deb packages to .rprm or even to install .deb on RPM-based distributions (see the “alien” command's information on the web), we cannot support installing or even converting them. Check with the vendor/supplier to get .rpm packages, or try and compile the program yourself (if the source code is available).

#### Miscellaneous

From time to time, we will revise and edit this web page.

Please send us your question via email to , and if question & answer are of general interest, we will amend this FAQ accordingly.