PETSc on the HPC Clusters

PETSc is a popular suite of data structures and routines for the scalable solution of scientific applications. This webpage provides a starting point for building PETSc on the HPC clusters.

PETSc is highly configurable so it is not pre-installed on the HPC clusters. Users must build their own version. Read the PETSc installation page before building the software. Below we provide build instructions for specific configurations. You will need to modify these for your own needs.

To see all the possible options do the following:

$ git clone -b release petsc
$ cd petsc
$ ./configure --help

To search on a specific keyword such as “blas”:

$ ./configure --help | grep -i blas



Below is a sample build procedure on stellar-intel:

$ ssh <YourNetID>
$ cd software  # or another directory of your choosing
$ wget
$ tar zxvf v3.15.5.tar.gz
$ cd petsc-3.15.5

$ module load intel/2021.1.2 intel-mpi/intel/2021.3.1 cmake/3.19.7

# copy and paste the next five lines
$ ./configure --with-clean --with-ssl=0 --with-c++-support --with-debugging=0 --with-shared-libraries=0 \
--with-clanguage=C++ --download-zlib --download-metis --download-parmetis --download-superlu_dist \
--download-superlu --download-mumps --download-blacs --download-fblaslapack --download-scalapack \
--known-mpi-shared-libraries=1 -download-zoltan --with-mpi-dir=$I_MPI_ROOT --with-scalar-type=real \

$ make PETSC_DIR=/home/$USER/software/petsc-3.15.5 PETSC_ARCH=real-dir all
$ make PETSC_DIR=/home/$USER/software/petsc-3.15.5 PETSC_ARCH=real-dir check

The command "unset I_MPI_HYDRA_BOOTSTRAP" prevents errors arising from the PETSc build system trying to run batch jobs. On Stellar you should also do "unset I_MPI_PMI_LIBRARY". These commands are necessary since Intel MPI was built for Slurm and Slurm is not used on the login nodes.



Below is an example installation procedure on TigerCPU:

$ ssh <YourNetID>
$ cd software  # or another directory of your choosing
$ git clone -b release petsc
$ cd petsc
$ module load intel/19.0/64/ intel-mpi/intel/2019.5/64
$ module load cmake/3.x rh/devtoolset/7
$ OPTFLAGS="-Ofast -xHost -DNDEBUG"

$ ./configure PETSC_ARCH=intel-mkl-complex --with-blaslapack-dir=$MKLROOT \
--with-scalapack-include=$MKLROOT/include \
--with-scalapack-lib="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" \

$ make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=intel-mkl-complex all
$ make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=intel-mkl-complex check

The procedure above uses the Intel compilers and Intel MPI library. By loading the cmake and rh modules, the PETSc build system can learn more about the host machine. Multiple warnings about data types will appear if these two modules are not loaded. In addition to taking advantage of compiler optimizations and vectorization, the procedure above builds PETSc against the Intel Math Kernel Library for BLAS, LAPACK and ScaLAPACK which gives a performance gain over the reference implementations of The command "unset I_MPI_HYDRA_BOOTSTRAP" prevents errors arising from the PETSc build system trying to run batch jobs.



Della is composed of different generations of Intel processors. The example below makes a so-called fat binary which allows it run optimally on both AVX2 and AVX-512:

$ ssh <YourNetID>
$ cd software
$ git clone -b release petsc
$ cd petsc
$ module purge
$ module load cmake/3.18.2
$ module load intel/ intel-mpi/intel/2019.7
$ OPTFLAGS="-Ofast -xCORE-AVX2 -axCORE-AVX512"

$ ./configure PETSC_ARCH=intel-mkl-double-complex --with-blaslapack-dir=$MKLROOT \
--with-scalapack-include=$MKLROOT/include \
--with-scalapack-lib="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" \
--download-mumps --download-superlu_dist --with-cuda=0 --download-hypre --with-debugging=0 \
--with-scalar-type=complex --with-precision=double

$ make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=intel-mkl-double-complex all
$ make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=intel-mkl-double-complex check

Note that we link against the Intel MKL library for BLAS, LAPACK and ScaLAPACK. This will give improved performance over downloading the versions available through PETSc (i.e., --download-fblaslapack and --download-scalapack). We also use optimization flags that take full advantage of each of the microarchitectures of Della.



Below is an example build for the Traverse cluster:

ssh <YourNetID>
cd software
git clone -b release petsc
cd petsc

module purge
module load cmake/3.19.7
module load openmpi/gcc/4.1.1/64
module load cudatoolkit/11.4

OPTFLAGS="-Ofast -mcpu=power9 -mtune=power9 -DNDEBUG"
CUDAFLAGS="-O3 --use_fast_math -arch=sm_70"

./configure PETSC_ARCH=openmpi-power \
--download-fblaslapack --with-debugging=0 \
--with-cuda=1 --CUDAOPTFLAGS="$CUDAFLAGS" --with-cuda-arch=70 \
--with-cxx-dialect=c++14 --with-cuda-dialect=c++14 \
--with-scalar-type=complex --with-batch=1

make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=openmpi-power all
make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=openmpi-power check

The above procedure could possibly be improved for performance by linking against ESSL. One could also use OpenBLAS.


Large Indices

In some cases you may need to build a version with 64-bit integers. The following builds PETSc against the 64-bit Intel MKL BLAS/LAPACK with multithreading on TigerCPU:

module load intel/19.0/64/ intel-mpi/intel/2019.5/64 rh/devtoolset/6
OPTFLAGS="-Ofast -xHost -mtune=skylake-avx512 -DNDEBUG"
git clone -b maint petsc
cd petsc

./configure PETSC_ARCH=arch-linux2-64 --with-debugging=0 \
--COPTFLAGS='-O3 -xHost -DMKL_ILP64' --CXXOPTFLAGS='-O3 -xHost -DMKL_ILP64' \
--with-blaslapack-include="${MKLROOT}/include" \
--with-blaslapack-lib="-L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl"

If you encounter an error like "TESTING: configureMPIEXEC from config.packages.MPI(config/BuildSystem/config/packages/" or "Runaway process exceeded time limit" then try running this command before running the configure script: unset I_MPI_HYDRA_BOOTSTRAP. In some cases these errors can be addressed by adding --with-mpiexec="srun -N 1 -n 1 -t 1" or --with-batch to the configure line.



Below is an example of building PETSc with CUDA on TigerGPU:

ssh <YourNetID>
git clone -b release petsc
cd petsc

module load cmake/3.x
module load rh/devtoolset/8
module load openmpi/gcc/3.1.5/64
module load cudatoolkit/11.3

OPTFLAGS="-O3 -march=native"

./configure PETSC_ARCH=arch-gcc-openmpi-cuda-release --with-debugging=0 \
--with-cxx-dialect=c++14 --with-cuda-dialect=c++14 \
--CUDAOPTFLAGS="-O3 --use_fast_math -arch=sm_60" --with-scalar-type=complex --with-fortran-kernels=1 \
--with-fortran-interface=1 --with-cuda=1 --download-slepc=yes

make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=arch-gcc-openmpi-cuda-release all

Running make check will fail because there is no GPU on the head node:

make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=arch-gcc-openmpi-cuda-release check

 See PETSc installation notes on how to use a GPU.


Additional notes

For some builds you will need to run the PETSc configure script and then modify the makefiles for your purposes and then run make all. This approach has proven successful for building a multi-threaded version of MUMPS.

See tips on linking against the Intel MKL and the URL to the Link Line Advisor on that page.

If you encounter any difficulties with PETSc then please send an email to or attend a help session.