PETSc on the HPC Clusters

PETSc is a popular suite of data structures and routines for the scalable solution of scientific applications. This post provides a starting point for building PETSc on the HPC clusters.

PETSc is highly configurable so it is not pre-installed on the HPC clusters. Users must build their own version. Read the PETSc installation page before building the software. Below we provide build instructions for specific configurations. You will need to modify these for your own needs.

To see all the possible options do the following:

git clone -b release petsc
cd petsc
./configure --help

To search on a specific keyword such as “blas”:

./configure --help | grep -i blas



Below is an example installation procedure on TigerCPU:

ssh <YourNetID>
cd software  # or another directory of your choosing
git clone -b maint petsc
cd petsc
module load intel/19.0/64/ intel-mpi/intel/2019.5/64
module load cmake/3.x rh/devtoolset/7
OPTFLAGS="-Ofast -xHost -mtune=skylake-avx512 -DNDEBUG"

./configure --with-blaslapack-dir=$MKLROOT --with-scalapack-include=$MKLROOT/include --with-scalapack-lib="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --with-debugging=0 COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-scalar-type=complex

make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=arch-linux2-c-opt all
make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=arch-linux2-c-opt check

The procedure above uses the Intel compilers and Intel MPI library. By loading the cmake and rh modules, the PETSc build system can learn more about the host machine. Multiple warnings about data types will appear if these two modules are not loaded. In addition to taking advantage of compiler optimizations and vectorization, the procedure above builds PETSc against the Intel Math Kernel Library for BLAS, LAPACK and ScaLAPACK which gives a performance gain over the reference implementations of The command unset I_MPI_HYDRA_BOOTSTRAP prevents errors arising from the PETSc build system trying to run batch jobs. For TigerGPU the -mtune value should set to broadwell instead of skylake-avx512.



Della is composed of different generations of Intel processors. The example below make a so-called fat binary which allows it run on both AVX2 and AVX512:

git clone -b maint petsc
cd petsc
module purge
module load cmake/3.x rh/devtoolset/8
module load intel/19.0/64/ intel-mpi/intel/2018.3/64
OPTFLAGS="-Ofast -xHost -axCORE-AVX512"
./configure PETSC_ARCH=intel-mkl-double-complex --with-blaslapack-dir=$MKLROOT --with-scalapack-include=$MKLROOT/include --with-scalapack-lib="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --download-mumps --download-superlu_dist --download-hypre --with-debugging=0 COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-scalar-type=complex --with-precision=double

make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=intel-mkl-double-complex all
make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=intel-mkl-double-complex check

You must exclude the old Ivybridge nodes when using the build above. Add the following line to your Slurm script:

#SBATCH --exclude=della-r4c1n[1-16] # exclude ivy nodes



Below is an example procedure for building PETSc on Nobel:

git clone -b maint petsc
cd petsc
module load intel/17.0/64/ openmpi/intel-17.0/1.10.2/64
OPTFLAGS="-Ofast -xHost -mtune=haswell -DNDEBUG"

./configure --with-scalapack-include=$MKLROOT/include --with-scalapack-lib="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --with-debugging=0 --with-blaslapack-dir=$MKLROOT COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-scalar-type=complex

make PETSC_DIR=/n/homeserver2/user2a/$USER/petsc PETSC_ARCH=arch-linux2-c-opt all
make PETSC_DIR=/n/homeserver2/user2a/$USER/petsc PETSC_ARCH=arch-linux2-c-opt check

Note that we link against the Intel MKL library for BLAS, LAPACK and ScaLAPACK. This will give improved performance over downloading the versions available through PETSc (i.e., --download-fblaslapack and --download-scalapack). We also use optimization flags that take full advantage of the vector instructions of the host processor.



Below is an example build for the Traverse cluster:

module load openmpi/gcc/4.0.1/64 rh/devtoolset/6
OPTFLAGS="-O3 -march=native -mtune=native -DNDEBUG"
git clone -b maint petsc
cd petsc

./configure PETSC_ARCH=linux-gnu --with-mpi-dir=/usr/local/openmpi/4.0.1/gcc/ppc64le --download-fblaslapack --download-slepc --with-debugging=0 COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-scalar-type=real --with-mpiexec="srun -N 1 -n 2 -t 1"

make PETSC_DIR=/home/$USER/petsc PETSC_ARCH=linux-gnu all
make PETSC_DIR=/home/$USER/petsc PETSC_ARCH=linux-gnu check

The above procedure could be improved for performance by linking against ESSL and using the IBM XL compilers. One could also use OpenBLAS.

Here is another procedure for Traverse using the XL compiler:

$ git clone -b maint petsc
$ cd petsc

$ module load openmpi/devtoolset-8/4.0.3rc1/64 rh/devtoolset/8

$ OPTFLAGS="-Ofast -qarch=pwr9 -qtune=pwr9 -DNDEBUG"

$ ./configure --with-cc=xlc --with-cxx=xlC --with-fc=xlf COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-debugging=0 --with-scalar-type=real --with-batch=1 --with-cuda=0 --with-mpi=1 --download-fblaslapack --download-scalapack --download-mumps --with-mpi-include=/usr/local/openmpi/4.0.3rc1/devtoolset-8/ppc64le/include --with-mpi-lib=\[/usr/local/openmpi/4.0.3rc1/devtoolset-8/ppc64le/lib64/,/usr/local/openmpi/4.0.3rc1/devtoolset-8/ppc64le/lib64/\]

$ make PETSC_DIR=/home/$USER/petsc PETSC_ARCH=arch-linux2-c-opt all
$ make PETSC_DIR=/home/$USER/petsc PETSC_ARCH=arch-linux2-c-opt check


Large Indices

In some cases you may need to build a version with 64-bit integers. The following builds PETSc against the 64-bit Intel MKL BLAS/LAPACK with multithreading on TigerCPU:

module load intel/19.0/64/ intel-mpi/intel/2019.5/64 rh/devtoolset/6
OPTFLAGS="-Ofast -xHost -mtune=skylake-avx512 -DNDEBUG"
git clone -b maint petsc
cd petsc

./configure PETSC_ARCH=arch-linux2-64 --with-debugging=0 COPTFLAGS='-O3 -xHost -DMKL_ILP64' CXXOPTFLAGS='-O3 -xHost -DMKL_ILP64' --with-blaslapack-include="${MKLROOT}/include" --with-blaslapack-lib="-L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl"

If you encounter an error like "TESTING: configureMPIEXEC from config.packages.MPI(config/BuildSystem/config/packages/" or "Runaway process exceeded time limit" then try running this command before running the configure script: unset I_MPI_HYDRA_BOOTSTRAP. In some cases these errors can be addressed by adding --with-mpi="srun -N 1 -n 1 -t 1" or --with-batch to the configure line.



Below is an example of building PETSc with CUDA:

tar zxf petsc-3.13.1.tar.gz 
cd petsc-3.13.1
module load rh/devtoolset/8 openmpi/gcc/3.1.5/64 cmake/3.x
OPTFLAGS="-O3 -march=native"
./configure PETSC_ARCH=arch-gcc-openmpi-cuda-release --with-debugging=0 COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" CUDAOPTFLAGS="-O3 --use_fast_math -arch=sm_60" --with-scalar-type=complex --with-fortran-kernels=1 --with-fortran-interface=1 --with-cuda=1 --download-slepc=yes
make PETSC_DIR=/home/$USER/software/petsc-3.13.1 PETSC_ARCH=arch-gcc-openmpi-cuda-release all

Running make check will fail because there is no GPU on the head node:

make PETSC_DIR=/home/$USER/software/petsc-3.13.1 PETSC_ARCH=arch-gcc-openmpi-cuda-release check

Note that you must rewrite parts of your code in order to use a GPU.


Additional notes

For some builds you will need to run the PETSc configure script and then modify the makefiles for your purposes and then run make all. This approach has proven successful for building a multi-threaded version of MUMPS.

See tips on linking against the Intel MKL and the URL to the Link Line Advisor on that page.

If you encounter any difficulties with PETSc then please send an email to or attend a help session.