Singularity recommendations

1. Singularity configuration at Finis Terrae II

Several Singularity versions are currently installed at FT2 and available for all users through the module system. All versions share the same settings, detailed in this section, in order to provide transparent access to Singularity containers. Is important to remark that Singularity has been installed in a shared device, this implies to setup the localstatedir configuration argument during installation.

Singularity has been installed in privileged (SUID) mode in order to utilize all its features. Unprivileged containers, only in “user namespace” have limited features, so we are not using this configuration right now.

There are no limitations or restrictions on the users IDs or storage paths where to store Singularity images. This means that any user with read and execution permissions is able to use any Singularity image located in any directory.

Singularity has been configured to allow users to bind custom host directories inside the container. However, as FT2 kernel does not support the overlay feature, the destination directory of a bind-mount action must exist inside the container. If the destination directory does not exist inside the container the bind-mount will not succeed.

The Singularity installation is configured to automatically share some common linux directories and files from the host to the container. These directories are /proc, /sys, /dev, /tmp, and /home. Some files like /etc/resolv.conf, /etc/hosts and /etc/localtime are also automatically mounted inside the container to import host network and timing settings.

Finally, the current user and group info are automatically added into /etc/passwd and /etc/groups files. This means the user and group ids are exactly the same inside and outside the container.

2. Good practices using Singularity at Finis Terrae II

This section describes some instructions and good practices for building Singularity containers. These instructions provide some keys to build valid containers and avoid issues, making the containers completely transparent for users and also for the infrastructure, and easing the usage while dealing with Singularity and contained software at Finis Terrae II, but it can also be applied to other HPC systems.

The first rule is to provide a valid Singularity container. Within the container, an entire distribution of Linux or a very lightweight tuned set of packages can be included, preserving the usual Linux directories hierarchy. It is recommended to set up the required environment variables within the container in order to expose a consistent environment.

FT2 kernel version is currently 2.32 (September 2018). Newest OS like Debian buster and Ubuntu Bionic require a more recent kernel and are not compatible with FT2 kernel.

To get transparent access to the host e-infrastructure storage (in this case we use Finis Terrae II as an example) from the containers, /mnt and /scratch directories/paths must exist within the container to be shared with the host. This allows the container to maintain the consistency with the host configuration, environment variables, etc.

Applications within a Singularity image must not be installed in any of the automatically mounted devices or directories. If this occurs, applications will be hidden for the end-user.

To run parallel applications using multiple nodes with MPI, the container must have installed support for MPI and PMI. Also for taking advantage of some HPC resources like Infiniband networks or GPUs, the container must support them. This means that the container must have installed the proper libraries to communicate with the hardware and also to perform inter-process communications.

  • In the case of Infiniband support, there are not any known restrictions about the infiniband libraries installed inside the container.

  • In the particular case of using GPUs from a Singularity container, the container must have nvidia-smi installed. Singularity provides GPU containers portability through the experimental NVidia support option to allow containers to automatically use the host drivers.

  • Regarding MPI, in the current context, due to the Singularity hybrid MPI approach, it’s mandatory to use the same implementation and version of MPI installed at the host and inside the container to run MPI parallel applications, and also to use the corresponding mpirun or mpiexec launcher, instead of srun (Slurm default process manager), as process manager to ensure PMI compatibility. Both OpenMPI and IntelMPI implementations are supported and have been tested.

Other alternatives where explored to provide portability of MPI Singularity containers but are not recommended for its complexity or lack of generality. These alternatives are bind-mount of host MPI inside the container and the usage of PMIx > 2.1 as default process manager and OpenMPI >= 3.0.0.

This set of recommendations is summarized in two publicly available documents; the recommendations document itself [4] and the bootstrap definition templates [5].

3. Detailed recommendations for building Singularity containers

3.1. Storage devices

3.1.1. Build

In order to get access to Finis Terrae II storage devices, we encourage you to check the existence of /mnt directory inside the container. If this directory doesn’t exists you can create it.

Although being a default directory in the linux OS hierarchy, we recommend to add the command mkdir -p /mnt to the %post section of your bootstrap definition file.

3.1.2. Usage

To get a transparent access to Finis Terrae II storage devices while using Singularity, our recommendation is to bind-mount /mnt directory inside the container.

    $ module purge
    $ module load singularity/$VERSION
    $ singularity exec -B /mnt $CONTAINER $COMMAND

3.2. Scratch directory

3.2.1. Build

In order to get access to /scratch directories of the compute nodes, we encourage you to create this directory inside the container.

We recommend to add the command mkdir -p /scratch to the %post section of your bootstrap definition file.

3.2.2. Usage

It’s recommended to use a scratch folder to store temporal data while executing in a Finis Terrae II compute node.

The most transparent strategy is to bind-mount the /scratch host directory into the /scratch container directory.

    $ module purge
    $ module load singularity/$VERSION
    $ singularity exec -B /scratch $CONTAINER $COMMAND

Other alternative is to bind-mount the /scratch host directory in other temporal directory like /tmp.

By default at Finis Terrae II, /scratch directory is the place to store the OpenMPI session. While launching MPI applications, if this directory doesn’t exists or it’s not writable, you must specify other using the TMPDIR environment variable.
    $ module purge
    $ module load singularity/$VERSION
    $ export TMPDIR=/tmp
    $ singularity exec -B /scratch:/tmp $CONTAINER $COMMAND

3.3. Shared-Memory containers

3.3.1. Build

There isn’t any known particular restriction for running shared-memory applications from a Singularity container.

3.3.2. Usage

    $ module purge
    $ module load singularity/$VERSION
    $ singularity exec -B /scratch -B /mnt $CONTAINER $COMMAND

3.4. MPI containers

3.4.1. Build

In order to run parallel applications in multiple nodes, Singularity documentation tell us that the container must support MPI and PMI(x). Also for taking advantage of some HPC resouces like Infiniband networks, the container must suppport it. This means that the container must have installed the propper libraries to communicate with the hardware and also to perform inter-process communications.

In the case of Infiniband, there is not any known restrictions about the infiniband libraries installed inside the container.

Regarding MPI, in the current context, due to the Singularity hybrid MPI approach, you need to have the same implementation and version of MPI installed at the host and inside the container to run parallel/MPI applications. We strongly recommend to use an OpenMPI implementation.

The currently available MPI implementations at Finis Terrae II are listed below. Singularity images containing MPI applications must contain any of this MPI implementations to properly run in parallel at Finis Terrae II:

Family Version

OpenMPI

1.10.2

OpenMPI

1.10.7

OpenMPI

2.0.0

OpenMPI

2.0.1

OpenMPI

2.0.1-cuda8.0

OpenMPI

2.0.2

OpenMPI

2.1.1

IntelMPI

5.1

IntelMPI

2017

IntelMPI

2017ibi

IntelMPI

2017.4.239

IntelMPI

2018

IntelMPI

2018.1.163

IntelMPI

2018.2.199

IntelMPI

2018.3.222

BullMPI

1.2.9.1

You can get more info about how to load this modules using module spider tool.

3.4.2. Usage

Please, take care of selecting the same MPI implementation and version at the host and inside the container. You must use mpirun, instead of srun, as process manager to ensure PMI compatibility.

    $ module purge
    $ module load $COMPILER $MPI_VERSION
    $ module load singularity/$VERSION
    $ mpirun $ARGS singularity exec -B /scratch -B /mnt $CONTAINER $COMMAND

3.5. GPU containers

3.5.1. Build

In the particular case of using GPUs from a container, the contained NVidia driver must exactly match the NVidia driver installed at the host. There are several alternatives in order to have the right NVidia driver within the container.

  • Install it persistently inside the container.

  • Bind-mount the host driver inside the container.

In both cases nvidia-smi must be installed inside the container.

The big con of a persistent installation is the lack of portability, as you cannot use the same container in other host with a different NVidia driver version.

3.5.2. Usage

Singularity provides the --nv option to automagically bind-mount the NVidia drivers (experimental Nvidia support).

Please, ensure that you are in a GPU compute node to run your GPU containers.
    $ module purge
    $ module load singularity/$VERSION
    $ mpirun singularity exec --nv -B /scratch -B /mnt $CONTAINER $COMMAND

3.6. Sample recipes

Some templates stored in this github repository

3.6.1. Basic recipe template

3.6.2. MPI recipe template