Containers on the Research Computing Clusters IntroductionSoftware has grown in complexity over the years making it difficult at times to even install and run the software. Containers address this problem by storing the software and all of its dependencies (including a minimal operating system) in a single, large image so that there is nothing to install and when it comes time to run the software everything "just works". This makes the software both shareable and portable while ensuring reproducibility.Apptainer is a Secure Alternative to DockerDocker images are not secure because they provide a means to gain root access to the system they are running on. For this reason Docker is not available on the Princeton Research Computing clusters (neither is nvidia-docker).For many years, we used Singularity as an alternative to Docker. Singularity branched into three different projects (Singularity, SingularityCE and Apptainer). Apptainer has replaced Singularity on the Princeton Research Computing systems.Apptainer is an alternative to Docker that is both secure and designed for high-performance computing. Apptainer is compatible with all Docker images and it can be used with GPUs and MPI applications. Learn about the differences between virtual machines, Docker and Singularity/Apptainer.(Note: If you are more familiar with Docker commands, this conversion table is very useful)Reasons to Use ContainersAn Apptainer image bundles an application together with its software dependencies, data, scripts, documentation, license and a minimal operating system. Software in this form ensures reproducible results. In fact, a DOI can be obtained for an image for publications. As software dependencies continue to grow in complexity, this approach becomes more attractive.Apptainer images are stored as a single file which makes them easily shareable. You can host your images on the Singularity Cloud Library for others to download. You could also make it available by putting it on a web server (e.g., tigress-web) like any other file.An Apptainer image can run on any system that has the same architecture (e.g., x86-64) and binary file format for which the image was made. This provides portability.Software built on one system with a certain glibc version may not run on a second system with an older glibc. One may also encounter issues with ABI compatibility, for example, with the standard C++ library. These issues can be solved with Apptainer by building the image with an updated base operating system and the corresponding libraries and tools.Scientific software is often developed for specific Linux distributions such as Ubuntu. It can be difficult to install such software on other Linux distributions. In this case the easiest approach is to make an Apptainer image using the same base operating system that the installation directions are written for.The Princeton Research Computing clusters undergo maintenance on the second Tuesday of each month. Changes to the cluster can cause your software to stop working. However, containerized software is largely unaffected by these changes which leads to greater stability of your workflow.Apptainer can be used to run massively-parallel applications which leverage fast InfiniBand interconnects and GPUs. These applications suffer minimal performance loss since Apptainer was designed to run "close to the hardware".Bring Your Own Software (BYOS). That is, you don't have to ask the system adminstrators if they are willing to install something for you. You can install whatever you want inside the image and then run it. This is because there is no way to escalate priviledges. That is, the user outside the container is the same user inside so there are no additional security concerns with Apptainer containers.Popular Container RegistriesWhen looking for containerized software try these repositories:Docker HubNVIDIA GPU CloudSingularity Cloud LibrarySingularity HubQuay.ioBioContainersIBM PowerAI (Traverse only)AMD InfinityHub (AMD GPUs)It is easy to make your own container images available on Docker Hub and the Singularity Cloud Library.ApptainerApptainer is a container platform specifically for high-performance computing. It supports MPI and GPU applications as well as Infiniband networks.CacheWorking with Apptainer images requires lots of storage space. By default Apptainer will use ~/.apptainer as a cache directory which can cause you to go over your /home quota. Consider adding these environment variables to your ~/.bashrc file:export APPTAINER_CACHEDIR=/scratch/gpfs/$USER/APPTAINER_CACHE export APPTAINER_TMPDIR=/tmpOn Adroit, replace /scratch/gpfs with /scratch/network above (see Data Storage for more).Obtaining the Image: Using the pull CommandSome software is provided as a Apptainer image with the .sif or .simg file extension. More commonly, however, a Docker image will be provided and this must be converted to a Apptainer image. For instance, if the installation directions are saying:$ docker pull brinkmanlab/psortb_commandline:1.0.2Then download and convert the Docker image to a Apptainer image with:$ apptainer pull docker://brinkmanlab/psortb_commandline:1.0.2This will produce the file psortb_commandline_1.0.2.sif in the current working directory, where 1.0.2 is a specific version of the software or a "tag".Below is another example for the case where the image is on the Singularity Cloud Library:$ apptainer pull library://sylabsed/examples/lolcow:1.0In some cases the build command should be used to create the image:$ apptainer build <name-of-image.sif> <URI>Unlike pull, build will convert the image to the latest Apptainer image format after downloading it.Obtaining the Image: Working from a DockerfileSome software is provided as a Dockerfile instead of an actual container. In this case, if you have Docker installed on your local machine (e.g., laptop) then you can create the Docker image yourself and then transfer it to one of the Research Computing clusters where the Apptainer image can be built.On your local machine, after making the Docker image, get the image id by running this command:$ docker imagesNext, save that image as a tar file (say it was id 9c27e219663c):$ docker save 9c27e219663c -o myimage.tarCopy myimage.tar to one of the Research Computing clusters using scp and then create the Apptainer image. These commands might look as follows:$ scp myimage.tar <YourNetID>@della.princeton.edu:software $ ssh <YourNetID>@della.princeton.edu $ cd software $ apptainer build myimage.sif docker-archive://myimage.tarYou may then follow the directions below for running myimage.sif which is a Apptainer container.RunningTo run the default command within the Apptainer image use, for example:$ apptainer run ./psortb_commandline_1.0.2.sif <arg-1> <arg-2> ... <arg-N>Note that some containers do not have a default command. See the documentation for the apptainer run command.To run a specific command that is defined within the container, use apptainer exec:$ apptainer exec ./psortb_commandline_1.0.2.sif <command> <arg-1> <arg-2> ... <arg-N> $ apptainer exec ./psortb_commandline_1.0.2.sif python3 myscript.py 42Use the "shell" command to run a shell within the container:$ apptainer shell ./psortb_commandline_1.0.2.sif Apptainer> cat /etc/os-release Apptainer> cd / Apptainer> ls -l Apptainer> exit The apptainer shell command is very useful when you are trying to find certain files within the container (see below).Your Files and Storage Spaces are AvailableA running container automatically bind mounts these paths:/home/<YourNetID>/tigress/projects/scratch (i.e., /scratch/gpfs, /scratch/network)/usr/licensed/tmpthe directory from which the container was ranThis makes it easy for software within the container to read or write files on the Research Computing filesystems. For instance, if your image is looking for an argument that specifies the path to your data then one can simply supply the path:$ apptainer run myimage.sif -d /scratch/gpfs/aturing/mydataThe bind mounting of /usr/licensed allows for license files to be used. You can also create your own custom bind mounts. For more information see bind mounting on the Apptainer website.Finding Files within a ContainerTo prevent mounting of the RC filesystems (e.g., /scratch/gpfs, /tigress, /projects) use the --containall option. This is useful for searching for files within the container, for example:$ apptainer shell --containall dart_121520.sif Apptainer> find / -iname "*python*" 2>/dev/nullEnvironment VariablesApptainer by default exposes all environment variables from the host inside the container. Use the --cleanenv argument to prevent this:$ apptainer run --cleanenv <image.sif> <arg-1> <arg-2> ... <arg-N>One can define an environment variable within the container as follows:$ export APPTAINERENV_MYVAR=Overridden With the above definition, MYVAR will have the value "Overridden". You can also modify the PATH environment variable within the container using definitions such as the following:$ export APPTAINERENV_PREPEND_PATH=/opt/important/bin $ export APPTAINERENV_APPEND_PATH=/opt/fallback/bin $ export APPTAINERENV_PATH=/only/bin For more see the Environment and Metadata page on the Apptainer website.Inspecting the Definition FileOne can sometimes learn a lot about the image by inspecting its definition file:$ apptainer inspect --deffile psortb_commandline_1.0.2.sifThe definition file is the recipe by which the image was made (see below). If the image was taken from Docker Hub then a definition file will not be available.Example ConversionHere is an example of converting directions for Docker to Apptainer. The Docker directions are:$ docker run -v /host/gtdbtk_output:/data -v /host/release89:/refdata ecogenomic/gtdbtk --helpTo convert the above to Apptainer, one would use:$ apptainer run -B /host/gtdbtk_output:/data -B /host/release89:/refdata </path/to>/gtdbtk_1.1.1.sif --helpThe Apptainer image in the above line can be obtained with:$ apptainer pull docker://ecogenomic/gtdbtk:1.1.1To learn about binding a directory within the container to a directory on the host, look at the -B option in the output of the command "apptainer help run".SlurmSerialBelow is a sample Slurm script for a serial application:#!/bin/bash #SBATCH --job-name=psortb # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem=4G # total memory per node (4 GB per cpu-core is default) #SBATCH --time=00:05:00 # total run time limit (HH:MM:SS) #SBATCH --mail-type=begin # send email when job begins #SBATCH --mail-type=end # send email when job ends #SBATCH --mail-user=<YourNetID>@princeton.edu module purge apptainer run </path/to>/psortb_commandline_1.0.2.sif <arg-1> <arg-2> ... <arg-N>Parallel MPI CodesBelow is a sample Slurm script for an MPI code:#!/bin/bash #SBATCH --job-name=solar # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=4 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G per cpu-core is default) #SBATCH --time=00:05:00 # total run time limit (HH:MM:SS) #SBATCH --mail-type=begin # send email when job begins #SBATCH --mail-type=end # send email when job ends #SBATCH --mail-user=<YourNetID>@princeton.edu module purge module load openmpi/gcc/4.1.2 srun apptainer exec $HOME/software/solar.sif /opt/ray-kit/bin/solar inputs.datNote that an Open MPI environment module is loaded and srun is called from outside the image. The MPI library which the code within the container was built against must be compatible with the MPI implementation on the cluster. Generally, the version on the cluster must be newer than what was used within the container. In some cases the major version of the two must match (i.e., version 4.x with 4.x). You may need to try multiple Open MPI modules before you find one that works and gives good performance. For more see Apptainer and MPI applications on the Apptainer website. In almost all cases, the hybrid model is being used.NVIDIA GPUsHere is one way to run TensorFlow. First obtain the image:$ apptainer pull docker://nvcr.io/nvidia/tensorflow:23.08-tf2-py3Below is a Slurm script appropriate for a GPU code such as TensorFlow:#!/bin/bash #SBATCH --job-name=myjob # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=4 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G per cpu-core is default) #SBATCH --time=00:05:00 # total run time limit (HH:MM:SS) #SBATCH --gres=gpu:1 # number of gpus per node #SBATCH --mail-type=begin # send email when job begins #SBATCH --mail-type=end # send email when job ends #SBATCH --mail-user=<YourNetID>@princeton.edu module purge apptainer exec --nv ./tensorflow_23.08-tf2-py3.sif python3 mnist_classify.pyFor more on Apptainer and GPU jobs see GPU Support on the Apptainer website.LearningA good way to learn Apptainer/Singularity is to work through this repo and watch the accompanying videos: part 1 | part 2. Also, watch the YouTube video by Marty Kandes of SDSC and view his sample Singularity definition files. Be aware of the NVIDIA hpc-container-maker. One can look at other sample definition files by pulling images from Singularity Hub or the Singularity Cloud Library and then running this command:$ apptainer inspect --deffile <image-name>.sif Building ImagesApptainer images are made from scratch using a definition file which is a text file that specifies the base image, the software to be installed and other information. See the documentation for apptainer build.One may also consider creating images using Docker since it is a larger community with a longer history and more support.Create a definition file (e.g., recipe.def) and then run:$ apptainer build myimage.sif recipe.def Below we provide three exampel recipe.def files.PythonLet's create an Apptainer image from scratch for running a Python script on a dataset. The image will contain the script and dataset making it easy for anyone to reproduce the results. We will assume that only pandas is required to run the Python script. The definition file below uses Ubuntu 22.04 as the base OS. Miniconda is installed which is then used to install pandas and all of its dependencies.Below are the contents of recipe.def:Bootstrap: docker From: ubuntu:22.04 %help This container provides a Python script and research data. To run the script: $ apptainer run myimage.sif # or ./myimage.sif The script is found in /opt/scripts and the data is found in /opt/data. %labels AUTHOR_NAME Alan Turing AUTHOR_EMAIL [email protected] VERSION 1.0 %environment export PATH=/opt/miniconda3/bin:${PATH} # set system locale export LC_ALL='C' %post -c /bin/bash apt-get -y update && apt-get -y upgrade apt-get -y install wget INSTALL_SCRIPT=Miniconda3-py38_4.9.2-Linux-x86_64.sh wget https://repo.anaconda.com/miniconda/${INSTALL_SCRIPT} bash ${INSTALL_SCRIPT} -b -p /opt/miniconda3 rm ${INSTALL_SCRIPT} /opt/miniconda3/bin/conda install pandas -y mkdir -p /opt/scripts && cd /opt/scripts wget https://tigress-web.princeton.edu/~jdh4/myscript.py mkdir -p /opt/data && cd /opt/data wget https://tigress-web.princeton.edu/~jdh4/mydata.csv # cleanup apt-get -y autoremove --purge apt-get -y clean %runscript python /opt/scripts/myscript.py %test /opt/miniconda3/bin/python --versionBelow are the contents of mydata.csv:pressure, temperatue, density 1.0, 2.3, 0.9 1.2, 3.1, 1.1 1.4, 3.9, 0.8 1.6, 5.4, 1.8Below are the contents of myscript.py:import pandas as pd df = pd.read_csv("/opt/data/mydata.csv", header=0) print(df.describe()) Build and run the image:$ apptainer build myimage.sif recipe.def $ ./myimage.sifRLet's create an Apptainer image from scratch for running an R script on a dataset. The definition file below shows how to install two R packages and how to include the script and data. It is important to realize that the definition file below builds on a pre-existing base image (Debian) that already has R installed. Look at the Rocker Project and Rocker on Docker Hub for other base images.Bootstrap: docker From: r-base:4.3.1 %help This container provides an R script and research data. To run the script: $ apptainer run myimage.sif # or ./myimage.sif The script is found in /opt/scripts and the data is found in /opt/data. %labels AUTHOR_NAME Alan Turing AUTHOR_EMAIL [email protected] VERSION 1.0 %post -c /bin/bash # update the package lists apt-get -y update # install dependencies for tidyverse apt-get -y install libxml2-dev libcurl4-openssl-dev libssl-dev # install extra packages apt-get -y install file vim # install R packages R -e 'install.packages(c("dplyr", "tidyverse"))' mkdir -p /opt/scripts && cd /opt/scripts wget https://tigress-web.princeton.edu/~jdh4/myscript.R mkdir -p /opt/data && cd /opt/data wget https://tigress-web.princeton.edu/~jdh4/mydata.csv %runscript Rscript /opt/scripts/myscript.R %test #!/bin/bash exec R -e 'library(dplyr); library(tibble)'Below is the contents of myscript.R:library(dplyr) library(tibble) dt <- read.csv("/opt/data/mydata.csv") dt_tbl <- as_tibble(dt) print(summary(dt_tbl))See above for the contents of mydata.csv.The image can be built (it takes several minutes) and then ran:$ apptainer build myimage.sif recipe.def $ apptainer run myimage.sifNote that most R packages require certain system libraries to be installed. If a system library is missing you will see a message like the following during the build:* installing *source* package ‘openssl’ ... ** package ‘openssl’ successfully unpacked and MD5 sums checked ** using staged installation Using PKG_CFLAGS= --------------------------- [ANTICONF] -------------------------------- Configuration failed because openssl was not found. Try installing: * deb: libssl-dev (Debian, Ubuntu, etc) * rpm: openssl-devel (Fedora, CentOS, RHEL) * csw: libssl_dev (Solaris) * brew: [email protected] (Mac OSX) If openssl is already installed, check that 'pkg-config' is in your PATH and PKG_CONFIG_PATH contains a openssl.pc file. If pkg-config is unavailable you can set INCLUDE_DIR and LIB_DIR manually via: R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...' -------------------------- [ERROR MESSAGE] --------------------------- tools/version.c:1:10: fatal error: openssl/opensslv.h: No such file or directory 1 | #include | ^~~~~~~~~~~~~~~~~~~~ compilation terminated. -------------------------------------------------------------------- ERROR: configuration failed for package ‘openssl’ * removing ‘/usr/local/lib/R/site-library/openssl’When you encounter errors like that above, add the missing package to the definition file and try again. In this case libssl-dev was missing.LAMMPS: A Parallel MPI CodeThe definition file below can be used to create an Apptainer image for an MPI version of LAMMPS. It is based on MPI version 4. The example below is only for demonstration purposes. It is best to install LAMMPS from source or use the container provided by NVIDIA NGC.Bootstrap: docker From: ubuntu:20.04 %environment export OMPI_DIR=/opt/ompi export PATH="$OMPI_DIR/bin:$PATH" export LD_LIBRARY_PATH="$OMPI_DIR/lib:$LD_LIBRARY_PATH" export MANPATH="$OMPI_DIR/share/man:$MANPATH" export LC_ALL='C' %post -c /bin/bash export DEBIAN_FRONTEND=noninteractive apt-get -y update && apt-get -y upgrade apt-get -y install python3-dev build-essential cmake wget git # build MPI library echo "Installing Open MPI" export OMPI_DIR=/opt/ompi export OMPI_VERSION=4.0.5 export OMPI_URL="https://download.open-mpi.org/release/open-mpi/v4.0/" export OMPI_FILE="openmpi-$OMPI_VERSION.tar.bz2" mkdir -p /mytmp/ompi mkdir -p /opt # Download cd /mytmp/ompi && wget -O openmpi-$OMPI_VERSION.tar.bz2 $OMPI_URL$OMPI_FILE tar -xjf openmpi-$OMPI_VERSION.tar.bz2 # Compile and install cd /mytmp/ompi/openmpi-$OMPI_VERSION ./configure --prefix=$OMPI_DIR && make install # Set env variables so we can compile our application export PATH=$OMPI_DIR/bin:$PATH export LD_LIBRARY_PATH=$OMPI_DIR/lib:$LD_LIBRARY_PATH export MANPATH=$OMPI_DIR/share/man:$MANPATH echo "Compiling the MPI application..." cd /opt && wget https://tigress-web.princeton.edu/~jdh4/mpitest.c mpicc -o mpitest mpitest.c # build LAMMPS mkdir -p /mytmp/lammps cd /mytmp/lammps wget https://github.com/lammps/lammps/archive/stable_29Oct2020.tar.gz tar zxf stable_29Oct2020.tar.gz cd lammps-stable_29Oct2020 mkdir build && cd build cmake -D CMAKE_INSTALL_PREFIX=/opt/lammps -D ENABLE_TESTING=yes \ -D CMAKE_CXX_COMPILER=g++ -D MPI_CXX_COMPILER=mpicxx \ -D BUILD_MPI=yes -D BUILD_OMP=yes \ -D CMAKE_BUILD_TYPE=Release \ -D CMAKE_CXX_FLAGS_RELEASE="-Ofast -DNDEBUG" \ -D PKG_USER-OMP=yes -D PKG_MOLECULE=yes ../cmake make -j 4 make install mkdir -p /opt/lammps && cd /opt/lammps wget https://tigress-web.princeton.edu/~jdh4/in.melt # cleanup apt-get -y autoremove --purge apt-get -y clean rm -rf /mytmp %runscript /opt/lammps/bin/lmp -in /opt/lammps/in.meltNote that the highest instruction set or level of vectorization is not specified in CMAKE_CXX_FLAGS_RELEASE (e.g., -march=native). This should be done based on the machine that will be used to run the image.HelpApptainer User Guide. The help menu is shown below:$ apptainer help Linux container platform optimized for High Performance Computing (HPC) and Enterprise Performance Computing (EPC) Usage: apptainer [global options...] Description: Apptainer containers provide an application virtualization layer enabling mobility of compute via both application and environment portability. With Apptainer one is capable of building a root file system that runs on any other Linux system where Apptainer is installed. Options: --build-config use configuration needed for building containers -c, --config string specify a configuration file (for root or unprivileged installation only) (default "/etc/apptainer/apptainer.conf") -d, --debug print debugging information (highest verbosity) -h, --help help for apptainer --nocolor print without color output (default False) -q, --quiet suppress normal output -s, --silent only print errors -v, --verbose print additional information Available Commands: build Build an Apptainer image cache Manage the local cache capability Manage Linux capabilities for users and groups checkpoint Manage container checkpoint state (experimental) completion Generate the autocompletion script for the specified shell config Manage various apptainer configuration (root user only) delete Deletes requested image from the library exec Run a command within a container help Help about any command inspect Show metadata for an image instance Manage containers running as services key Manage OpenPGP keys oci Manage OCI containers overlay Manage an EXT3 writable overlay image plugin Manage Apptainer plugins pull Pull an image from a URI push Upload image to the provided URI remote Manage apptainer remote endpoints, keyservers and OCI/Docker registry credentials run Run the user-defined default command within a container run-help Show the user-defined help for an image search Search a Container Library for images shell Run a shell within a container sif Manipulate Singularity Image Format (SIF) images sign Attach digital signature(s) to an image test Run the user-defined tests within a container verify Verify cryptographic signatures attached to an image version Show the version for Apptainer Examples: $ apptainer help [] $ apptainer help build $ apptainer help instance start For additional help or support, please visit https://apptainer.org/help/FAQ1. How to deal with this error: FATAL: container creation failed: mount /proc/self/fd/3->/var/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/3: failed to mount squashfs filesystem: input/output error?It may mean that you ran out of disk space while building the image. Try running the checkquota command.2. How do deal with this error: FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: unable to create new build: while searching for mksquashfs: exec: "mksquashfs": executable file not found in $PATH?It could be that mksquashfs is not your PATH. It should be found in /usr/sbin.3. How to deal with this error: FATAL: could not open image /home/aturing/myimage.sif: SIF image /home/aturing/myimage.sif is corrupted: wrong partition size?This generally arises when pulling an image when you are over quota. Run the checkquota command to make sure that this is not the case.4. How to deal with the error: "mksquashfs not found"?It it most likely because /usr/sbin/ is not in your PATH. Run "echo $PATH" to check. Add it if necessary.In-Person Workshop (10/19/2023)DockerBelow is the Dockerfile to build a simple LAMMPS image:FROM ubuntu:22.04 RUN apt-get -y update \ && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends build-essential cmake wget RUN mkdir -p /opt/data && mkdir /mytemp && cd /mytemp \ && wget --no-check-certificate https://github.com/lammps/lammps/archive/refs/tags/stable_2Aug2023.tar.gz \ && tar zxf stable_2Aug2023.tar.gz && cd lammps-stable_2Aug2023 \ && mkdir build && cd build RUN cmake -D CMAKE_INSTALL_PREFIX=/opt \ -D BUILD_MPI=no -D BUILD_OMP=no -D CMAKE_BUILD_TYPE=Release \ -D CMAKE_CXX_FLAGS_RELEASE=-O3 -D PKG_MOLECULE=yes /mytemp/lammps-stable_2Aug2023/cmake RUN make -j 4 && make install && rm -rf /mytemp ENV PATH="/opt/bin:$PATH" COPY in.melt /opt/data Save the above using the filename Dockerfile. The commands below build the image, run it and then push the image to Docker Hub:$ docker build --tag jhalverson/lammps:cow --file Dockerfile . $ docker run -it --rm jhalverson/lammps:cow lmp -in /opt/data/in.melt $ docker push jhalverson/lammps:cow Run the Docker Image on Adroit:$ ssh <YourNetID>@adroit.princeton.edu $ apptainer pull docker://jhalverson/lammps:cow $ apptainer exec lammps_cow.sif lmp -in /opt/data/in.meltPyTorch and TensorFlowRun PyTorch on Adroit using a containerRun TensorFlow on Adroit using a container