HPC: Downtime for Operations on the Interconnect

2023-08-23 8am – 2023-08-24

2023/08/23

For reconfigurations on the Infiniband network, the cluster will be down.

In order to fully connect the new, upcoming expansion stage of the Lichtenberg II, the HPC interconnect needs to be reconfigured.

Being the heart of the cluster, the Infiniband fabric transports MPI data between processes on distinct nodes and the whole storage traffic from and to the storage system.

Thus, the physical connections between all (new) compute nodes and the storage system need to be as balanced and as symmetrical as possible.

As soon as the workings are finished, we will inform you on the [HPC-Nutzer] mailing list and on this HPC News page.

You do not need to do anything with respect to your (running or pending) batch jobs. The scheduler knows about the downtime and will

  • start pending jobs only if these will be safely finished before the downtime and
  • hold all others until after the downtime and recommencing of the normal scheduling.