Solved: Failure of the Cluster-wide File System

System back at normal and available

2024/11/04

+++ Update 17:00: The deadlock problem could only be fixed by (hard) reset of various GPFS master servers and a reboot of all compute nodes. Hence, all running jobs at the time of the GPFS lockup unfortunately are lost. If you did not explicitly prohibit it (by using special parameters), the scheduler will restart those jobs on its own. +++

Today morning, the cluster-wide file system started to quit working correctly, and is currently completely unavailable.

We are working on the problem and apologize for any inconveniences, and appreciate your understanding for lack of a forecast as to when it will be available again.