The Lichtenberg HPC can only be used via , defining the approved amount of resources the project can allocate on the HPC. In other words, a project's allotted number of core*hours determines the “share” of the overall computing resources of the HPC for this project. projects
All core*hours used within the course of a project are accounted on that project (like money spent is accounted on a bank account).
User vs. Project
Do not share your user account (neither password nor ssh
keys)! Collaboration is permitted only by being members in a common project.
Expiration
As projects can have several users/members, and a given user can be member of several projects, the validity terms of HPC user accounts and HPC projects are completely independent of each other. Both can expire (run out) at different dates, and extending one does not imply extending the other.
Jobs vs. Project
Submitting batch jobs is not possible without (implicitly) specifying a project (sbatch -A
parameter). If a user does not explicitly specify sbatch -A <projectname>
, the job will be allocated on that user's default project.
Rules of Accounting
The Lichtenberg cluster runs in “user-exclusive” mode: a given compute node will always execute only jobs of the same user at the same time.
This in turn means that even one single (small) job will block the assigned compute node for other users. Therefore, the accounting will book the equivalent of the full node's core*h (even if your job does not use all cores) on your project!
For small jobs (with no overly large memory footprint), we recommend to request even dividers of the amount of cores per node, so as to have these jobs share a given compute node without “clipping” waste of resources. In our case of compute nodes with 96 cores:
For this to work, strictly avoid the
#SBATCH --exlusive
pragma, as this would assign every (small) job its own, separate compute node!
Resources used
With the commands csum
and csreport
, any user can get a list of their current overall resource consumption.
Monthly Usage Report
At the end of a month, users get an automatic email with a usage overview on all projects they are associated with (“Lichtenberg User Report”).
Usage and efficiency plot
Since October 2019, the monthly mail with the resource usage features a graphical overview on the activities for a given project. The diagram provides the users with an easier insight into resource usage and efficiency of their projects.
This visualization is split into two parts--a combined accumulated CPU time plot and a per-job efficiency plot.
Part A)
The graph's upper panel shows the used core*hours over the validity term of the project, up to the current date. The gray line details all the accumulated core*hours allocated for the project. These correspond to the
- core*hours accounted to the project and
- core*hours blocked for exclusive use by the project.
The yellow line depicts the accumulated core*hours that the allocated cores were actually busy performing computations, and thus actually utilized for your computation.
Example:
If a 10-hour job running on 16 CPU cores executes with a 50% CPU efficiency, the project will be accounted a total of 160 core*hours (gray line), even though only 80 core*hours were actually utilized (yellow line). Even if 12 hours of runtime were requested for this example in the job script, only the job's actual runtime of 10 hours will be accounted on that project.
The colored bars indicate the project's monthly quotas (starting at the accumulated core*hour mark at the beginning of a given month), and are color coded according to the actual relative usage.
Example:
If a monthly budget of 17000 core*hours was granted and a total of 19000 core*hours were accounted for that given month, the bar for the month will be yellow (110% – 150% usage). The height of the bar for the following month will start at the position of the gray line at the end of the month PLUS 17000 core*hours (monthly quota).
If applicable, the remaining quota until end of the project is extrapolated and plotted using a thick black line.
Part B)
The graph's lower panel shows the CPU efficiency of each distinct job of the project. For that, the fraction of utilized core*hours divided by the accounted core*hours is used (not the core*hours requested in the job script).
Each job is represented by a semi-transparent purple dot. Multiple jobs with the same efficiency will overlay and appear as darker dots.
The smaller red dots indicate the average CPU efficiency of all jobs of each day with active jobs. Since the purple dots do not differentiate with respect to job length and size, the red dots will help to assess the efficiency of computations for days with varying efficiencies and job sizes.
Currently, this visualization only includes CPU metrics; it does not include any usage of accelerator cards (eg., GPUs) in conjunction with the CPU metrics.
In case you need the graph for your project(s) in a resolution-independent vector format, please don't hesitate to contact us as described at the end of the page.
HKHLR Tools
In the case that it is of interest to determine the efficiency of specific jobs or to identify specific jobs with a given efficiency, the HKHLR offers helper scripts. The necessary JobAnalysisTools module may be loaded by entering
module load hkhlr JobAnalysisTools/0.1
in a job script.
It offers three utilities. Using the
HKHLR_RecentJobEfficiencyReport
script, a list of efficiencies for recent jobs (default 7 days) can be generated.
If you want to scrutinize a certain job, you can use
HKHLR_JobReport $JOBID
to see that job's efficiency.
Lastly, the tool
HKHLR_GetJobIDsOfInterval $lowerBound $upperBound
returns the JOBIDs in the given efficiency window between [$lowerBound,$upperBound].
If you require further assistance, clarification or have feedback, please do not hesitate to contact us, either via the , or via mail to TU Darmstadt ticket systemdarmstadt@hpc-hessen.de.