Containers on the HPC Clusters

OUTLINE

 

Introduction

Containers make it possible to easily run software developed for one system on a second system. Informally, the idea is to store the software and all of its dependencies (including a minimal operating system) in a single, large file so that when it comes time to run the software everything "just works".

 

Singularity is a Secure Alternative to Docker

Docker images are not secure because they provide a means to gain root access of the system they are running on. For this reason Docker is not available on the Princeton HPC clusters (neither is nvidia-docker). This is not a problem because we offer Singularity which is a secure HPC alternative to Docker. Singularity is compatible with all Docker images and it can be used with GPUs and MPI applications. Learn about the differences between virtual machines, Docker and Singularity.

 

Reasons to Use Containers

  • A Singularity image bundles an application together with its software dependencies, data, scripts, documentation, license and its own minimal operating system. Software in this form ensures reproducible results. In fact, a DOI can be obtained for an image for publications. As software dependencies continue to grow in complexity this approach becomes more attractive.
  • Singularity images are stored as a single file which make them easily shareable. You can host your images on Singularity Hub for others to download.
  • A Singularity image can run on any system that has the same architecture (e.g., x86-64) and binary file format for which the image was made. This provides portability.
  • Software built on one system with a certain glibc version may not run on a second system with an older glibc. One may also encounter issues with ABI compatibility, for example, with the standard C++ library. These issues can be solved with Singularity by building the image with an updated base operating system and the corresponding libraries and tools.
  • Scientific software is often developed for specific Linux distributions such as Ubuntu. It can be difficult to install such software on other Linux distributions. In this case the easiest approach is to make a Singularity image using the same base operating system that the installation directions are written for.
  • The Princeton HPC clusters undergo maintenance on the second Tuesday of each month. Changes to the cluster can cause your software to stop working. However, containerized software is largely unaffected by these changes leading to enhanced stability of your workflow.
  • Singularity can be used to run massively-parallel applications which leverage InfiniBand networks and GPUs. These applications suffer minimal performance loss since Singularity was designed to run "close to the hardware".
  • Bring Your Own Software (BYOS). That is, you don't have to ask the system adminstrators if they are willing to install something for you. You can install whatever you want inside the image and then run it. This is because there is no way to escalate priviledges -- the user outside the container is the same user inside.

 

Popular Container Registries

When looking for containerized software try these repositories:

Note that you can search the Singularity Library from the command line:

$ singularity search polysolver

 

Singularity

Cache

Working with Singularity images requires lots of storage space. By default Singularity will use ~/.singularity as a cache directory which can cause you to go over your /home quota. Consider adding these environment variables to your ~/.bashrc file:

export SINGULARITY_CACHEDIR=/scratch/gpfs/$USER/SINGULARITY_CACHE
export SINGULARITY_TMPDIR=/tmp

On Adroit, replace /scratch/gpfs with /scratch/network above (see Storage for more).

 

Obtaining the Image: Using the pull Command

Some software is provided as a Singularity image with the .sif or .simg file extension. If you already have the image then proceed to the section below on running the image. More commonly, however, a Docker image will be provided and this must be converted to a Singularity image. For instance, if the installation directions are saying:

$ docker pull brinkmanlab/psortb_commandline:1.0.2

Then download and convert the Docker image to a Singularity image with:

$ singularity pull docker://brinkmanlab/psortb_commandline:1.0.2

This will produce the file psortb_commandline_1.0.2.sif in the current working directory, where 1.0.2 is a specific version of the software or a tag.

In some cases the build command should be used to create the image:

$ singularity build <name-of-image.sif> <URI>

Unlike pull, build will convert the image to the latest Singularity image format after downloading it.

 

Obtaining the Image: Working from a Dockerfile

Some software is provided as a Dockerfile instead of an actual container. In this case, if you have Docker installed on your local machine (e.g., laptop) then you can create the Docker image yourself and then transfer it to one of the HPC clusters where the Singularity image can be built.

On your local machine, after making the Docker image (in this example called images), get the image id by running this command:

$ docker images

Next, save that image as a tar file (say it was id 9c27e219663c):

$ docker save 9c27e219663c -o myimage.tar

Copy myimage.tar to one of the HPC clusters using scp and then create the Singularity image. These commands might look as follows:

$ scp myimage.tar <YourNetID>@della.princeton.edu:software
$ ssh <YourNetID>@della.princeton.edu
$ cd software
$ singularity build myimage.sif docker-archive://myimage.tar

You may then follow the directions below for running myimage.sif which is a Singularity container.

 

Running

To run the default command within the Singularity image use:

$ singularity run ./psortb_commandline_1.0.2.sif <arg-1> <arg-2> ... <arg-N>

To run a specific command use exec:

$ singularity exec ./psortb_commandline_1.0.2.sif <command> <arg-1> <arg-2> ... <arg-N>

Use the shell command to run a shell within the container:

$ singularity shell ./psortb_commandline_1.0.2.sif
Singularity> cat /etc/os-release
Singularity> cd /
Singularity> ls -l
Singularity> exit

The shell command is very useful when you are trying to find certain files within the container (see below).

Your Files and Storage Spaces are Available

A running container automatically bind mounts these paths:

  • /home/<YourNetID>
  • /tigress
  • /projects
  • /scratch (i.e., /scratch/gpfs, /scratch/network)
  • /usr/licensed
  • /tmp
  • the directory from which the container was ran

This makes it easy for software within the container to read or write files on the RC filesystems. For instance, if your image is looking for an argument that specifies the path to your data then one can simply supply the path:

$ singularity run myimage.sif -d /scratch/gpfs/aturing/mydata

The bind mounting of /usr/licensed allows for license files to be used. You can also create your own custom bind mounts. For more information see bind mounting on the Singularity website.

Finding Files within a Container

To prevent mounting of the RC filesystems (e.g., /scratch, /tigress, /projects, /home) use the --containall option. This is useful for searching for files within the container, for example:

$ singularity shell --containall dart_121520.sif
Singularity> find / -iname "*python*" 2>/dev/null

Environment Variables

Singularity by default exposes all environment variables from the host inside the container. Use the --cleanenv argument to prevent this:

$ singularity run --cleanenv <image.sif> <arg-1> <arg-2> ... <arg-N>

For more see the Environment and Metadata page on the Singularity website.

Inspecting the Definition File

One can sometimes learn a lot about the image by inspecting its definition file:

$ singularity inspect --deffile psortb_commandline_1.0.2.sif

If the image was taken from Docker Hub then a definition file will not be available.

 

Example Conversion

Here is an example of converting directions for Docker to Singularity. The Docker directions are:

$ docker run -v /host/gtdbtk_output:/data -v /host/release89:/refdata ecogenomic/gtdbtk --help

To convert the above to Singularity, one would use:

$ singularity run -B /host/output:/data -B /host/release89:/refdata </path/to>/gtdbtk_1.1.1.sif --help

The Singularity image in the above line can be obtained with:

$ singularity pull docker://ecogenomic/gtdbtk:1.1.1

To learn about binding a directory within the container to a directory on the host, look at the -B option in the output of this command: $ singularity help run

 

Slurm

Serial

Below is a sample Slurm script for a serial application:

#!/bin/bash
#SBATCH --job-name=singularity   # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=4G                 # total memory per node (4 GB per cpu-core is default)
#SBATCH --time=00:05:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
singularity run </path/to>/psortb_commandline_1.0.2.sif <arg-1> <arg-2> ... <arg-N>

 

Parallel MPI Codes

Below is a sample Slurm script for an MPI code:

#!/bin/bash
#SBATCH --job-name=solar         # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=4               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:05:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send mail when job begins
#SBATCH --mail-type=end          # send mail when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load openmpi/gcc/3.1.5/64
srun singularity exec $HOME/software/solar.sif /opt/ray-kit/bin/solar inputs.dat

Note that an Open MPI environment module is loaded and srun is called from outside the image. The MPI library which the code within the container was built against must be compatible with the MPI library on the cluster. You may need to try multiple Open MPI modules before you find one that works and gives good performance. For more see this page on the Singularity website.

 

NVIDIA GPUs

Here is one way to run TensorFlow. First obtain the image:

$ singularity pull docker://tensorflow/tensorflow:latest-gpu

Below is a Slurm script appropriate for a GPU code such as TensorFlow:

#!/bin/bash
#SBATCH --job-name=myjob         # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=4        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:05:00          # total run time limit (HH:MM:SS)
#SBATCH --gres=gpu:1             # number of gpus per node
#SBATCH --mail-type=begin        # send mail when job begins
#SBATCH --mail-type=end          # send mail when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
singularity exec --nv ./tensorflow_latest-gpu.sif python3 mnist_classify.py

For more on Singularity and GPU jobs see this page.

 

Learning

A good way to learn Singularity is to work through this repo and watch the accompanying videos: part 1 | part 2. Also, watch the YouTube video by Marty Kandes of SDSC and view his sample Singularity definition files. Be aware of the NVIDIA hpc-container-maker. One can look at other sample definition files by pulling images from Singularity Hub or the Singularity library and then running this command:

$ singularity inspect --deffile <image-name>.sif 

 

Building Images

Singularity images are most commonly made from a definition file which is a text file specifying the base image, the software to be installed and other information.

Python

Let's create a Singularity image from scratch for running a Python script on a dataset. The image will contain the script and dataset making it easy for anyone to reproduce your results. We will assume that only Pandas is required to run the Python script. The definition file below uses Ubuntu 20.04 as the base OS. Miniconda is installed which is then used to install Pandas and all of its dependencies. Because the image is made in the cloud, one must make the Python script and dataset available to be downloaded. In this case we use tigress-web.

Below are the contents of recipe.def:

Bootstrap: docker
From: ubuntu:20.04

%help
  This container provides a Python script and research data. To run the script:

    $ singularity run myimage.sif  # or ./myimage.sif

  The script is found in /opt/scripts and the data is found in /opt/data.

%labels
  AUTHOR_NAME Alan Turing
  AUTHOR_EMAIL aturing@princeton.edu
  VERSION 1.0

%environment
  export PATH=/opt/miniconda3/bin:${PATH}
  # set system locale
  export LC_ALL='C'

%post -c /bin/bash
  apt-get -y update && apt-get -y upgrade
  apt-get -y install wget
 
  INSTALL_SCRIPT=Miniconda3-py38_4.9.2-Linux-x86_64.sh
  wget https://repo.anaconda.com/miniconda/${INSTALL_SCRIPT}
  bash ${INSTALL_SCRIPT} -b -p /opt/miniconda3
  rm ${INSTALL_SCRIPT}
  /opt/miniconda3/bin/conda install pandas -y

  mkdir -p /opt/scripts && cd /opt/scripts
  wget https://tigress-web.princeton.edu/~jdh4/myscript.py

  mkdir -p /opt/data && cd /opt/data
  wget https://tigress-web.princeton.edu/~jdh4/mydata.csv
 
  # cleanup
  apt-get -y autoremove --purge
  apt-get -y clean

%runscript
  python /opt/scripts/myscript.py

%test
  /opt/miniconda3/bin/python --version

Below are the contents of mydata.csv:

pressure, temperatue, density
1.0, 2.3, 0.9
1.2, 3.1, 1.1
1.4, 3.9, 0.8
1.6, 5.4, 1.8

Below are the contents of myscript.py:

import pandas as pd
df = pd.read_csv("/opt/data/mydata.csv", header=0)
print(df.describe())

One must obtain a token from Sylabs before the Remote Builder can be used to create an image:

$ singularity remote login SylabsCloud

Then browse to https://cloud.sylabs.io/auth/tokens where you will need to log in or create a new account. When you have the token, paste it on the command line in the terminal and hit Enter (it will not display after being pasted). Be sure to download the token since it is only available once on the website. However, you can always create a new token if needed.

Build and run the image:

$ singularity build --remote myimage.sif recipe.def
$ ./myimage.sif

 

R

Let's create a Singularity image from scratch for running an R script on a dataset. The definition file below shows how to install two R packages and how to include the script and data. It is important to realize that the definition file below builds on a pre-existing base image (Debian) that already has R installed. Look at the Rocker Project and Rocker on Docker Hub for other base images. Because the image is made in the cloud, one must make the R script and dataset available to be downloaded. In this case we use tigress-web.

Bootstrap: docker
From: r-base:4.0.2

%help
  This container provides an R script and research data. To run the script:

    $ singularity run myimage.sif  # or ./myimage.sif

  The script is found in /opt/scripts and the data is found in /opt/data.

%labels
  AUTHOR_NAME Alan Turing
  AUTHOR_EMAIL aturing@princeton.edu
  VERSION 1.0

%post -c /bin/bash
  # update the package lists
  apt-get -y update

  # install dependencies for tidyverse
  apt-get -y install libxml2-dev libcurl4-openssl-dev libssl-dev

  # install extra packages
  apt-get -y install file vim

  # install R packages
  R -e 'install.packages(c("dplyr", "tidyverse"))'

  mkdir -p /opt/scripts && cd /opt/scripts
  wget https://tigress-web.princeton.edu/~jdh4/myscript.R

  mkdir -p /opt/data && cd /opt/data
  wget https://tigress-web.princeton.edu/~jdh4/mydata.csv

%runscript
  Rscript /opt/scripts/myscript.R

%test
  #!/bin/bash
  exec R -e 'library(dplyr); library(tibble)'

Below is the contents of myscript.R:

library(dplyr)
library(tibble)

dt <- read.csv("/opt/data/mydata.csv")
dt_tbl <- as_tibble(dt)
print(summary(dt_tbl))

See above for the contents of mydata.csv.

With an access token (see above), the image can be built and then ran:

$ singularity build --remote myimage.sif myrecipe.def
$ singularity run myimage.sif

Note that most R packages require certain system libraries to be installed. If a system library is missing you will see a message like the following during the build:

* installing *source* package ‘openssl’ ...
** package ‘openssl’ successfully unpacked and MD5 sums checked
** using staged installation
Using PKG_CFLAGS=
--------------------------- [ANTICONF] --------------------------------
Configuration failed because openssl was not found. Try installing:
 * deb: libssl-dev (Debian, Ubuntu, etc)
 * rpm: openssl-devel (Fedora, CentOS, RHEL)
 * csw: libssl_dev (Solaris)
 * brew: openssl@1.1 (Mac OSX)
If openssl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a openssl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
-------------------------- [ERROR MESSAGE] ---------------------------
tools/version.c:1:10: fatal error: openssl/opensslv.h: No such file or directory
    1 | #include 
      |          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
--------------------------------------------------------------------
ERROR: configuration failed for package ‘openssl’
* removing ‘/usr/local/lib/R/site-library/openssl’

When you encounter errors like that above, add the missing package to the definition file and try again. In this case libssl-dev was missing.

 

LAMMPS: A Parallel MPI Code

The definition file below can be used to create a Singularity image for an MPI version of LAMMPS. It is based on MPI version 4. The example below is only for demonstration purposes. It is best to install LAMMPS from source or use the container provided by NVIDIA NGC. Because the image is made in the cloud, one must make the input script available to be downloaded. In this case we use tigress-web.

Bootstrap: docker
From: ubuntu:20.04

%environment
  export OMPI_DIR=/opt/ompi
  export PATH="$OMPI_DIR/bin:$PATH"
  export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH"
  export MANPATH="$OMPI_DIR/share/man:$MANPATH"
  export LC_ALL='C'

%post -c /bin/bash
  export DEBIAN_FRONTEND=noninteractive
  apt-get -y update && apt-get -y upgrade
  apt-get -y install python3-dev build-essential cmake wget git

  # build MPI library
  echo "Installing Open MPI"
  export OMPI_DIR=/opt/ompi
  export OMPI_VERSION=4.0.5
  export OMPI_URL="https://download.open-mpi.org/release/open-mpi/v4.0/"
  export OMPI_FILE="openmpi-$OMPI_VERSION.tar.bz2"
  mkdir -p /mytmp/ompi
  mkdir -p /opt
  # Download
  cd /mytmp/ompi && wget -O openmpi-$OMPI_VERSION.tar.bz2 $OMPI_URL$OMPI_FILE
  tar -xjf openmpi-$OMPI_VERSION.tar.bz2
  # Compile and install
  cd /mytmp/ompi/openmpi-$OMPI_VERSION
  ./configure --prefix=$OMPI_DIR && make install
  # Set env variables so we can compile our application
  export PATH=$OMPI_DIR/bin:$PATH
  export LD_LIBRARY_PATH=$OMPI_DIR/lib:$LD_LIBRARY_PATH
  export MANPATH=$OMPI_DIR/share/man:$MANPATH

  echo "Compiling the MPI application..."
  cd /opt && wget https://tigress-web.princeton.edu/~jdh4/mpitest.c
  mpicc -o mpitest mpitest.c

  # build LAMMPS
  mkdir -p /mytmp/lammps
  cd /mytmp/lammps
  wget https://github.com/lammps/lammps/archive/stable_29Oct2020.tar.gz
  tar zxf stable_29Oct2020.tar.gz
  cd lammps-stable_29Oct2020
  mkdir build && cd build

  cmake -D CMAKE_INSTALL_PREFIX=/opt/lammps -D ENABLE_TESTING=yes \
  -D CMAKE_CXX_COMPILER=g++ -D MPI_CXX_COMPILER=mpicxx \
  -D BUILD_MPI=yes -D BUILD_OMP=yes \
  -D CMAKE_BUILD_TYPE=Release \
  -D CMAKE_CXX_FLAGS_RELEASE="-Ofast -DNDEBUG" \
  -D PKG_USER-OMP=yes -D PKG_MOLECULE=yes ../cmake

  make -j 4
  make install

  mkdir -p /opt/lammps && cd /opt/lammps
  wget https://tigress-web.princeton.edu/~jdh4/in.melt

  # cleanup
  apt-get -y autoremove --purge
  apt-get -y clean
  rm -rf /mytmp

%runscript
  /opt/lammps/bin/lmp -in /opt/lammps/in.melt

Note that the highest instruction set or level of vectorization is not specified in CMAKE_CXX_FLAGS_RELEASE (e.g., -march=native). This should be done based on the machine that will be used to run the image.

 

Help

The Singularity User Guide is here. The help menu is shown below:

$ singularity --help

Linux container platform optimized for High Performance Computing (HPC) and
Enterprise Performance Computing (EPC)

Usage:
  singularity [global options...]

Description:
  Singularity containers provide an application virtualization layer enabling
  mobility of compute via both application and environment portability. With
  Singularity one is capable of building a root file system that runs on any 
  other Linux system where Singularity is installed.

Options:
  -d, --debug     print debugging information (highest verbosity)
  -h, --help      help for singularity
      --nocolor   print without color output (default False)
  -q, --quiet     suppress normal output
  -s, --silent    only print errors
  -v, --verbose   print additional information
      --version   version for singularity

Available Commands:
  build       Build a Singularity image
  cache       Manage the local cache
  capability  Manage Linux capabilities for users and groups
  config      Manage various singularity configuration (root user only)
  delete      Deletes requested image from the library
  exec        Run a command within a container
  help        Help about any command
  inspect     Show metadata for an image
  instance    Manage containers running as services
  key         Manage OpenPGP keys
  oci         Manage OCI containers
  plugin      Manage Singularity plugins
  pull        Pull an image from a URI
  push        Upload image to the provided URI
  remote      Manage singularity remote endpoints
  run         Run the user-defined default command within a container
  run-help    Show the user-defined help for an image
  search      Search a Container Library for images
  shell       Run a shell within a container
  sif         siftool is a program for Singularity Image Format (SIF) file manipulation
  sign        Attach a cryptographic signature to an image
  test        Run the user-defined tests within a container
  verify      Verify cryptographic signatures attached to an image
  version     Show the version for Singularity

Examples:
  $ singularity help []
  $ singularity help build
  $ singularity help instance start

For additional help or support, please visit https://www.sylabs.io/docs/

 

FAQ

1. How to deal with this error: FATAL: container creation failed: mount /proc/self/fd/3->/var/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/3: failed to mount squashfs filesystem: input/output error?

It may mean that you ran out of disk space while building the image. Try running the checkquota command.