Compiling and Running MPI Jobs

Compiling parallel MPI programs

The Intel, PGI and GNU compilers are installed on all the clusters. The standard MPI implementation is Intel MPI, which supports the Infiniband infrastructure. OpenMPI is also available.

To set up your environment correctly, it is highly recommended to use the module command. This is a utility to correctly set your environment without having to know all the paths to the executables. In most cases a simple module load intel intel-mpi command can be issued setting up your environment to use the latest Intel compiler and MPI library.

To set up your environment.

module load openmpi intel

To compile Fortran code:

mpif90 myMPIcode.f90

To compile C code:

mpicc myMPIcode.c

To compile C++ code:

mpicxx myMPIcode.cpp

Compiling Vectorized code on Della with Intel Compilers

Please note: If you are compiling programs with the -x option, you will need to use -ax on Della as described below. On Della the -xCORE-AVX2 or -xHost options can result in poor performance or error messages.

Compiling with -xHost on the Della head node (a Broadwell) will produce code optimized for Broadwell processors. As a result, when run on the older nodes, the executable will fail with an error message similar to: "Please verify that both the operating system and the processor support Intel(R) F16C and AVX1 instructions." When running on the Skylake nodes, it may run below optimal performance. The recommended solution is to use the -ax flag to tell the compiler to build a binary with instruction sets for each architecture, and choose the best one at runtime. For example, instead of -xCORE-AVX2 or -xHost, use:

$ icc -Ofast -xCORE-AVX2 -axCORE-AVX512 -o myexe mycode.c The resulting executable will then be able to run on both Broadwell as well as Skylake and Cascade nodes. It will failure on Ivy nodes and those should be excluded with: #SBATCH --exclude=della-r4c1n[1-16]

Submitting an MPI Job

Once the parallel processing executable, a.out, is compiled, a job script to run it will need to be created for Slurm. Here is a sample command script, parallel.cmd, which uses 16 CPUs (8 CPU cores per node). In most cases, you should specify --ntasks-per-node to be equal to the number of cores per node on the system where the job will run. See the Cluster Configuration table for details for each cluster. If you need help with job submission parameters, send e-mail to cses@princeton.edu or come to one of the twice-weekly Help Sessions.

For example, create a file called parallel.cmd with contents like this:

#!/bin/bash
#SBATCH --job-name=multinode     # create a short name for your job
#SBATCH --nodes=2                # node count
#SBATCH --ntasks-per-node=16     # number of tasks per node
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=2G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=04:00:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email on job start, end and fail
#SBATCH --mail-user=YourNetID@princeton.edu

module purge
module load intel intel-mpi

srun ./a.out

(NOTE: Change "YourNetID" to your own NetID in the above script.) If the job is in a file named parallel.cmd, type the following command to submit it:

sbatch parallel.cmd