Power Rail Short & Fire L5|08 [Update 2025-04-08]
HPC & Housing are inoperative
2025/03/25
Due to a short and subsequently, a fire in the power rail system, the TURZ-L5|08 data center is currently inop, affecting the Lichtenberg HPC, network and housing.
Update 2025-04-08
Confirmed: delivery and mounting of the gas canisters for the fire extinguishing system will take place Tuesday, 15th of April.
The 2000 amps power circuit breaker protecting the server room's main power rail has already been checked, and does not need to be replaced (even though it has been tripped due to the short).
Wednesday this week, the power rail's affected (and most likely damaged) segments will be dismantled, and the remainder being checked for damages (by insulation measurement).
If all that goes without objection, the planned date of next week for a partial restoration of cluster operations appears to still be maintainable.
Update 2025-04-01
We are planning the next temporary cluster data access for Wednesday, April 2nd and Tuesday, April 8th, from around 9am to 4pm.
The following login nodes will be available: lcluster1.hrz.tu-darmstadt.de lcluster2.hrz.tu-darmstadt.de
lcluster13.hrz.tu-darmstadt.de lcluster14.hrz.tu-darmstadt.de
Access is only for copying/downloading files like data and code, not for compute jobs (as all compute nodes are switched off).
Update 2025-03-29
Inspections and partial repairs of the power rails are scheduled for April, 9th-11th. Provided a positive outcome of the inspection, we tentatively project the resume of housing services and cluster operations in the week of April 14th-18th.
Update 2025-03-26
With our partner for electrical infrastructure, we are working on a solution to temporarily bring up Housing and Storage, to allow copying of / backing up data and code.
As the gaseous fire extinguishing system needs to be refilled/replaced, this will only be possible during office hours (on-site firewatch necessary).
Together with the HPC team of RWTH Aachen, we are working on ways to allow projects switching to their HPC systems.
2025-03-25
Returning to normal operations will only be possible after inspection and repair of the power rail, and after refilling/replacing the gaseous fire extinguishing system.
Housing customers will be informed separately – the server room is cleared and access to housed systems is possible.
If you are affected, you may also contact us via , Tel HRZ service. 16-71112