Projects, accounts & accounting

The Lichtenberg HPC can only be used via projects , defining the approved amount of resources the project can allocate on the HPC. In other words, a project's allotted number of core*hours determines the “share” of the overall computing resources of the HPC for this project.

All core*hours used within the course of a project are accounted on that project (like money spent is accounted on a bank account).

Project Memberships

To use a project for your scientific calculations, you need to become a member of this project. Info about project membership

User vs. Project

A user account (personalized) is associated with one or more projects (the first project being that user's default project).
Unlike this strictly personalized user account, projects can and are permitted to be shared among several colleagues and students working on the same scientific problem.

Do not share your user account (neither password nor ssh keys)! Collaboration is permitted only by being members in a common project.

Expiration

As projects can have several users/members, and a given user can be member of several projects, the validity terms of HPC user accounts and HPC projects are completely independent of each other. Both can expire (run out) at different dates, and extending one does not imply extending the other.

Jobs vs. Project

Submitting batch jobs is not possible without (implicitly) specifying a project (sbatch -A parameter). If a user does not explicitly specify sbatch -A <projectname>, the job will be allocated on that user's default project.

Rules of Accounting

The Lichtenberg cluster runs in “user-exclusive” mode: a given compute node will always execute only jobs of the same user at the same time.

This in turn means that even one single (small) job will block the assigned compute node for other users. Therefore, the accounting will book the equivalent of the full node's core*h (even if your job does not use all cores) on your project!

For small jobs (with no overly large memory footprint), we recommend to request even dividers of the amount of cores per node, so as to have these jobs share a given compute node without “clipping” waste of resources. In our case of compute nodes with 96 cores:

96 / 24 = 4 of your jobs per node
96 / 32 = 3 of your jobs per node

For this to work, strictly avoid the

#SBATCH --exlusive

pragma, as this would assign every (small) job its own, separate compute node!

Resources used

With the commands csum and csreport, any user can get a list of their current overall resource consumption.

Monthly Usage Report

At the end of a month, users get an automatic email with a usage overview on all projects they are associated with (“Lichtenberg User Report”).

Usage and efficiency plot

Since October 2019, the monthly mail with the resource usage features a graphical overview on the activities for a given project. The diagram provides the users with an easier insight into resource usage and efficiency of their projects.

This visualization is split into two parts--a combined accumulated CPU time plot and a per-job efficiency plot.

Part A)

The graph's upper panel shows the used core*hours over the validity term of the project, up to the current date. The gray line details all the accumulated core*hours allocated for the project. These correspond to the

  • core*hours accounted to the project and
  • core*hours blocked for exclusive use by the project.

The yellow line depicts the accumulated core*hours that the allocated cores were actually busy performing computations, and thus actually utilized for your computation.

Example:

If a 10-hour job running on 16 CPU cores executes with a 50% CPU efficiency, the project will be accounted a total of 160 core*hours (gray line), even though only 80 core*hours were actually utilized (yellow line). Even if 12 hours of runtime were requested for this example in the job script, only the job's actual runtime of 10 hours will be accounted on that project.

The colored bars indicate the project's monthly quotas (starting at the accumulated core*hour mark at the beginning of a given month), and are color coded according to the actual relative usage.

Example:

If a monthly budget of 17000 core*hours was granted and a total of 19000 core*hours were accounted for that given month, the bar for the month will be yellow (110% – 150% usage). The height of the bar for the following month will start at the position of the gray line at the end of the month PLUS 17000 core*hours (monthly quota).

If applicable, the remaining quota until end of the project is extrapolated and plotted using a thick black line.

Part B)

The graph's lower panel shows the CPU efficiency of each distinct job of the project. For that, the fraction of utilized core*hours divided by the accounted core*hours is used (not the core*hours requested in the job script).
Each job is represented by a semi-transparent purple dot. Multiple jobs with the same efficiency will overlay and appear as darker dots.

The smaller red dots indicate the average CPU efficiency of all jobs of each day with active jobs. Since the purple dots do not differentiate with respect to job length and size, the red dots will help to assess the efficiency of computations for days with varying efficiencies and job sizes.

Currently, this visualization only includes CPU metrics; it does not include any usage of accelerator cards (eg., GPUs) in conjunction with the CPU metrics.

In case you need the graph for your project(s) in a resolution-independent vector format, please don't hesitate to contact us as described at the end of the page.

HKHLR Tools

In the case that it is of interest to determine the efficiency of specific jobs or to identify specific jobs with a given efficiency, the HKHLR offers helper scripts. The necessary JobAnalysisTools module may be loaded by entering

module load hkhlr JobAnalysisTools/0.1

in a job script.

It offers three utilities. Using the

HKHLR_RecentJobEfficiencyReport 

script, a list of efficiencies for recent jobs (default 7 days) can be generated.

If you want to scrutinize a certain job, you can use

HKHLR_JobReport $JOBID

to see that job's efficiency.

Lastly, the tool

HKHLR_GetJobIDsOfInterval $lowerBound $upperBound

returns the JOBIDs in the given efficiency window between [$lowerBound,$upperBound].

If you require further assistance, clarification or have feedback, please do not hesitate to contact us, either via the TU Darmstadt ticket system, or via mail to .