HLR Infiniband Reconfiguration – done

2022-11-07 9:00am

2022/11/07

Routing in the Lichtenberg's Infiniband fabric will be switched to “Adaptive Routing” – short ceasing in starting jobs.

2022-11-07 Update:

Since 9:15, the Infiniband fabric runs with Adaptive Routing, and the queues have been reactivated.

-------------------------------------------------------

Like other networks, Infiniband has also different ways of routing network packets from source to target.

To date, the Lichtenberg uses an older, less efficient routing incapable of using all possible pathes (routes) concurrently from source to target.

While this older routing ensures in-order arrival of packets (those sent earlier are guaranteed to arrive before packets sent later), the new “adaptive routing” does not by design.

Older Open and Intel MPI could not cope with this out-of-order arrival of MPI packets.

Meanwhile, even the last commercial application Ansys (formerly requiring the in-order arrival) has got an update.

Thus, at 2022-11-07 9am we will reconfigure the Lichtenberg IB interconnect to use Adaptive Routing, providing way more efficient use of all pathes/routes in the fabric.

Jobs at this time

In order to not wreak havoc to running jobs, we have configured a “job only” downtime. Shortly before until shortly after the switch, the scheduler will have completed all running jobs and will cease to start pending jobs.

For that, you don't have to do anything with respect to your jobs, whether running or pending.

The cluster and login nodes remain accessible.