Overview
While container might seem a viable way of using complex, yet readily-packaged applications, their runtime implications can turn out to be more difficult.
For example, access to files on our shared file systems needs to be configured per container (eg. creating “volumes”), and most container runtime implementations are not well fitted to work from shared file systems (NFS or GPFS/SpectrumStorage our Lichtenberg is based upon).
From within a container, it is also difficult to access our software module system.
“Over”containerization
In some cases, a container recipe does nothing more than installing a certain python interpreter version plus loading some (special) python modules. Instead of trying to run such a “simple” container, you might be better off with a similarly configured “python virtual environment”.
This is easy to create and maintain, and it works perfectly with shared file systems and with our module system.
Thus, firstly check your desired container recipe for being just such a simple one – then it would not be worth the effort getting it to run on the cluster!
On the Lichtenberg cluster, we support only the following CRIs:
Apptainer(formerly Singularity)
Partially supported (with a lot of manual efforts on our part):
podman– requires manual mapping of sub UIDs and sub GIDs and does not work well with shared file systems
Not supported is
– requires elevated privileges (“dockerroot” rights)
Apptainer
Apptainer (f.k.a Singularity) provides a light-weight, portable container runtime – and since being developed with HPC in mind, does run quite well on linux clusters with shared file systems.
Setting up
To fetch and convert existing container images from docker's to apptainer's format (here, “lolcow” being used as example):
# From docker registry:
mkdir -p myCont && cd myCont
singularity build lolcow.sif docker://godlovedc/lolcow
singularity run lolcow.sif
# From docker archive ("podman pull" only works if podman is correctly configured):
mkdir -p myCont && cd myCont
podman pull docker://godlovedc/lolcow
singularity build lolcow_from_archive.sif docker-archive://$(pwd)/lolcow_docker_archive.tar
singularity run lolcow_from_archive.sif
# From a docker file:
mkdir -p myCont && cd myCont
git clone https://github.com/GodloveD/lolcow
source myenv/bin/activate
spython recipe lolcow/Dockerfile ./sing.lolcow
singularity build --fakeroot lolcow_from_dockerfile.sif sing.lolcow
singularity run lolcow_from_dockerfile.sif
deactivate
Using in a batch job
To use such a converted apptainer in a batch job, simply add the two lines:
cd myCont
singularity run lolcow_from_dockerfile.sifto your job script.
Example with Nvidia GPUs:
It is also possible to use Apptainer/Singularity with Nvidia GPUs (currently, AMD and Intel GPUs still have problems):
# We request 2 Nvidia Hopper H100 GPUs for interactive usage:
srun -t 07:00:00 -n16 --mem-per-cpu=4G --gres=gpu:h100:2 --pty /bin/bash
# We export two environment variables setting the paths for temporary files, to be sure to have enough free space for them:
export TMPDIR=${HPC_SCRATCH} APPTAINER_TMPDIR=${HPC_SCRATCH}
# We build an Apptainer container Image, saving it in the file "pytorch_cont.sif"
apptainer build pytorch_cont.sif docker://nvcr.io/nvidia/pytorch:25.03-py3
# We launch a container based on this image:
apptainer shell --nv pytorch_cont.sif
# Testing it:
Apptainer> python
Python 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
2
>>> exit()