HPC: Downtime due to Power Maintenance

2023-05-09 7am

2023/05/09 by

For several hours, the power supply of the L5|08 building will be completely shut down for maintenance.

2023-05-11 – The HPC cluster is up and running in normal operations.

Delicately debugging internal (disk) states together with the manufacturer has identified the culprits, and the shared file system is up and running again.

2023-05-09:

Today's core work of maintaining/replacing the 24kV and 400V power circuit breakers was completed around noon, yet problems in the central cluster-wide file system spoil again getting the cluster up and running.

___________________________________

On Tuesday, 2023-05-09 at 7am, the Lichtenberg HPC will have to be shut down for ~1 day (including login nodes), because the power to the whole building L5|08 needs to be switched off.

Reason is the maintenance / replacement of several main circuit breakers in the building's medium (20kV) and lower voltage (400V) section, required by law to be done every 10 years.

Though we expect the maintenance to last no longer than 4h, the usual “unexpected” imponderabilities prompt us to announce the downtime for the whole day nonetheless.

As soon as the workings are finished, we will inform you on the [HPC-Nutzer] mailing list and on this HPC News page.

You do not need to do anything with respect to your (running or pending) batch jobs. The scheduler knows about the downtime and will

  • start pending jobs only if these will be safely finished before the downtime and
  • hold all others until after the downtime and recommencing of the normal scheduling.