You can use the following file systems:
Mountpoint |
/work/home/ (= /home/ )
|
/work/projects/ /work/groups/
|
/work/scratch/
|
---|---|---|---|
Size | Σ 6,1 PByte | ||
Performance |
up to 1,200 Gbps (the bandwidth availble for the individual user depends on the overall file system traffic) |
||
Files accessible from/for | global for all nodes | ||
Persistence | permanent | during the project's validity term + 6 months grace time | 8 weeks after their last write access, files will be deleted unconditionally and without further notice |
Quota |
50 GByte and/or 4 Mio files |
5 TByte and/or 200,000 files |
20 TByte and/or 20 Mio files |
Backup |
Snapshots (see below) + daily tape backup (for desaster recovery only) |
none! | |
Usage pattern |
low-volume I/O Interactive, static input data, results of finished jobs Do not use home, groups or projects for running jobs! |
high-volume I/O Batch, running jobs' input/output, intermediary files (CPR) |
Since October 2019, the global cluster file systems do not differ in throughput or latency any longer, as they share the same large . pool of NVMe SSDs
However, some internal optimizations determine two “classes” of file systems: and low volume I/O . high volume I/O
___________________________________________________________________________
Low-volume I/O
Caution
Due to being too much volume for snapshots and tape backup: Do not use home
, groups
or projects
for I/O of running jobs!
Backup
For this class of file systems with low turnover in terms of number and size of files, the following backup mechanisms are in place:
- a daily tape backup
This is mainly to protect against catastrophic damages to the file system, and restores are not directly available to users. - the snapshot mechanism
For your own restore/recover purposes, the IBM Storage Scale file system (see “Technology” below) does regular snapshots of the content. Available via the hidden folder “.snapshots/
” in every subdirectory, you can copy back what has been lost, ie. any file inadvertently deleted. For details, see “Technologies” below.
Do not use directories on these file systems for running jobs!
/home
The home directory should be used for all files that are important and need to be stored permanently.
Every user can only store a small amount of data here (see “quota” above). In well reasoned cases and on request, this quota can be increased.
The folder /home/$USER (“Home”) is created with each user account. You can reference it by the environment variable $HOME
.
/work/projects and /work/groups
Available on request, groups (institutes) can get a group folder, to share static input data and common software (versions) for their members and coworkers.
Likewise, projects with more than a few members can request a projects folder for the same purposes.
___________________________________________________________________________
High-volume I/O
Backup
NONE!
This class of file systems is explicitly optimized for high I/O volume and high throughput for jobs and applications.
Due to the high turnover in created/deleted/changed files, an (expensive) legacy tape backup would “explode” in volume and meta data. For the same reason, even the GPFS-internal snapshots could rather quickly exceed the file system's physical capacity.
There is absolutely no backup for the following file systems. What you delete here, is gone forever, and cannot be restored/recovered, not even by administrators.
/work/scratch
Here, almost unlimited disk space is available for all users, but only for a limited time: After 8 weeks of not being written to, the files will be deleted unconditionally without further notice (automatic deletion/removal policy).
A plain read access is not sufficient to prevent deletion, since for performance reasons, the files' “last access date” is not written reliably and may thus not be current.
The folder /work/scratch/$USER
(“scratch”) is created with each user account. In job scripts, it can be referenced as environment variable $HPC_SCRATCH
.
___________________________________________________________________________
Technologies
IBM Storage Scale
All shared, cluster-wide file systems above are based on (formerly known as General Parallel File System). This commercial product can share large disk arrays via Infiniband among thousands of nodes. IBM's Storage Scale
Of course, arbitrating read/write requests from such amounts of nodes to individual files will take some more time than accessing local disks. In addition, all running jobs and all logged-in users are equally working on this shared storage resource.
That's why you sometimes see (hopefully short) “hiccups” when doing a “ls -l
” or similar commands. This is perfectly normal and an expected result from the principle “common, shared file system available everywhere”.
Snapshots
For folders on the
of file systems, IBM Storage Scale automatically creates periodic snapshots, allowing you to access (and restore) older versions of your files without assistance by the admins. Snapshots are saved to the hidden folder low-volume I/O class.snapshots
(you would not see this folder listed, not even by an 'ls -la
'). Nonetheless, you can go to that hidden folder by explicitly “cd .snapshots
” (<TAB>-completion does not work either, you have to fully type .snapshots
).
Once being in .snapshots/
, you can do 'ls -l' and 'cd' as usual, and access former versions (or states) of all your data (within the hourly.*/
, 6hour.*/
, daily.*/
and weekly.*/
directories).
Files from the snapshot folder still occupy storage space! Therefore, it is possible your home folder's quota is exceeded, even though the 'df
' command still shows less usage.
Snapshots cannot be deleted (deleting data creates copies of the snapshot).
Frequent saving and deleting files fills up the snapshot area and requires space at the containing folder. Thus, this should be avoided if possible (so do not use home, groups or projects folders for high-volume I/O, eg. for I/O of running jobs!).
In urgent cases, the snapshot folder can be cleaned/deleted by the administrators.