Information on Cluster Outage
(due to electrical short and fire in March 2025)

On this page, we provide information about the causes, the current status, and the schedule for 2026 until the Lichtenberg II high-performance computer is fully restored to operational readiness.


What happened?

At the end of March 2025, there was a small smoldering fire in the electrical system that primarily supplies power to the second expansion stage of Lichtenberg II (LB2A2). This caused a total failure of both the entire HPC system and the adjacent housing area for approximately 2.5 weeks.

>>> However, the first expansion stage of Lichtenberg II (LB2A1) and all GPU systems are still in operation and fully functional.

>>> Update from January 23, 2026: Repair work is progressing according to plan: Two racks in expansion stage 2 are once again being supplied with power. This means that 23% (around 132 computing nodes) more are now available for computing.

Our Main Goal

is to make all existing compute capacity available again as quickly as possible.


Causes

The analysis report from an external expert is now available and shows that the oldest components of the electrical system were overloaded due to incorrect and insufficient specifications. The report recommended a recalculation of the entire high-voltage electrical system of the data center. This is now almost complete and the necessary changes will be implemented from January 2026.


What happens next?

  • Complete repair of the electrical system for LB2A2 in several stages:
    • January 23, 2026: The first partial commissioning with at least 20% of LB2A2 has been completed.
      2026-01-20: provisional repair of the 2000A power rail successfully completed.
      2026-01-23: 75% of LB2A1 compute nodes (96 CPU cores) + 20% of the LB2A2 compute nodes (104 CPU cores) and all GPU nodes are running
    • April/May 2026: Larger partial commissioning with the goal of at least 50% of LB2A2
    • End of 2026: Full commissioning (100% of LB2A2)
  • Approx. May 2026: Commissioning of the next Lichtenberg CPU expansion stage (LB NHR-1)
  • Approx. July 2026: Commissioning of the GPU expansion for LB NHR-1
  • In the course of 2026: Gradual decommissioning of the oldest cluster expansion stage (LB2A1)


Stumbling blocks

There were delays in repairing the electrical system for the Lichtenberg high-performance computer, cooperation and communication with the previous manufacturer of the electrical system was not optimal, and the external investigation of the cause of the fire and the recalculation took a lot of time.


We are here for you!

Until sufficient computing resources are available again, we will continue to support the migration of project resources to other NHR centers, such as Aachen, with which we have a joint usage concept. For further support, questions, and feedback, . You are also welcome to make use of our HPC consultation hours and introductory courses. We will provide further information as the cluster is fully repaired through continuous updates on this website and a monthly information email.


Overview of current and upcoming Lichtenberg cluster expansion stages:

LB2A1

  • CPU: approx. 60,000 CPU cores
  • GPU: 16x Nvidia V100, 40x Nvidia A100

LB2A2

  • CPU approx. 62,000 CPU cores
  • GPU 8x Nvidia H100, 16x AMD MI300X, 20x Intel PVC Max 1550

LB NHR-1

  • CPU approx. 90,000 CPU cores
  • GPU 48x Nvidia B200

Graphical representation of the recommissioning process

Timeline for recommissioning the Lichtenberg II high-performance computer
Timeline for recommissioning the Lichtenberg II high-performance computer