PETSc is a popular suite of data structures and routines for the scalable solution of scientific applications. This post provides a starting point for building PETSc on the HPC clusters.
PETSc is highly configurable so it is not pre-installed on the HPC clusters. Users must build their own version. Read the PETSc installation page before building the software. Below we provide build instructions for specific configurations. You will need to modify these for your own needs.
To see all the possible options do the following:
git clone -b release https://gitlab.com/petsc/petsc.git petsc cd petsc ./configure --help
To search on a specific keyword such as “blas”:
./configure --help | grep -i blas
TigerCPU
Below is an example installation procedure on TigerCPU:
ssh <YourNetID>@tigercpu.princeton.edu cd software # or another directory of your choosing git clone -b maint https://gitlab.com/petsc/petsc.git petsc cd petsc module load intel/19.0/64/19.0.5.281 intel-mpi/intel/2019.5/64 module load cmake/3.x rh/devtoolset/7 OPTFLAGS="-Ofast -xHost -mtune=skylake-avx512 -DNDEBUG" unset I_MPI_HYDRA_BOOTSTRAP ./configure --with-blaslapack-dir=$MKLROOT --with-scalapack-include=$MKLROOT/include --with-scalapack-lib="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --with-debugging=0 COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-scalar-type=complex make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=arch-linux2-c-opt all make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=arch-linux2-c-opt check
The procedure above uses the Intel compilers and Intel MPI library. By loading the cmake and rh modules, the PETSc build system can learn more about the host machine. Multiple warnings about data types will appear if these two modules are not loaded. In addition to taking advantage of compiler optimizations and vectorization, the procedure above builds PETSc against the Intel Math Kernel Library for BLAS, LAPACK and ScaLAPACK which gives a performance gain over the reference implementations of netlib.org. The command unset I_MPI_HYDRA_BOOTSTRAP prevents errors arising from the PETSc build system trying to run batch jobs. For TigerGPU the -mtune value should set to broadwell instead of skylake-avx512.
Della
Della is composed of different generations of Intel processors. The example below make a so-called fat binary which allows it run on both AVX2 and AVX512:
git clone -b maint https://gitlab.com/petsc/petsc.git petsc cd petsc module purge module load cmake/3.x rh/devtoolset/8 module load intel/19.0/64/19.0.5.281 intel-mpi/intel/2018.3/64 unset I_MPI_HYDRA_BOOTSTRAP OPTFLAGS="-Ofast -xHost -axCORE-AVX512" ./configure PETSC_ARCH=intel-mkl-double-complex --with-blaslapack-dir=$MKLROOT --with-scalapack-include=$MKLROOT/include --with-scalapack-lib="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --download-mumps --download-superlu_dist --download-hypre --with-debugging=0 COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-scalar-type=complex --with-precision=double make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=intel-mkl-double-complex all make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=intel-mkl-double-complex check
You must exclude the old Ivybridge nodes when using the build above. Add the following line to your Slurm script:
#SBATCH --exclude=della-r4c1n[1-16] # exclude ivy nodes
Nobel
Below is an example procedure for building PETSc on Nobel:
git clone -b maint https://gitlab.com/petsc/petsc.git petsc cd petsc module load intel/17.0/64/17.0.4.196 openmpi/intel-17.0/1.10.2/64 OPTFLAGS="-Ofast -xHost -mtune=haswell -DNDEBUG" ./configure --with-scalapack-include=$MKLROOT/include --with-scalapack-lib="-L$MKLROOT/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64" --with-debugging=0 --with-blaslapack-dir=$MKLROOT COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-scalar-type=complex make PETSC_DIR=/n/homeserver2/user2a/$USER/petsc PETSC_ARCH=arch-linux2-c-opt all make PETSC_DIR=/n/homeserver2/user2a/$USER/petsc PETSC_ARCH=arch-linux2-c-opt check
Note that we link against the Intel MKL library for BLAS, LAPACK and ScaLAPACK. This will give improved performance over downloading the versions available through PETSc (i.e., --download-fblaslapack and --download-scalapack). We also use optimization flags that take full advantage of the vector instructions of the host processor.
Traverse
Below is an example build for the Traverse cluster:
module load openmpi/gcc/4.0.1/64 rh/devtoolset/6 OPTFLAGS="-O3 -march=native -mtune=native -DNDEBUG" git clone -b maint https://gitlab.com/petsc/petsc.git petsc cd petsc ./configure PETSC_ARCH=linux-gnu --with-mpi-dir=/usr/local/openmpi/4.0.1/gcc/ppc64le --download-fblaslapack --download-slepc --with-debugging=0 COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-scalar-type=real --with-mpiexec="srun -N 1 -n 2 -t 1" make PETSC_DIR=/home/$USER/petsc PETSC_ARCH=linux-gnu all make PETSC_DIR=/home/$USER/petsc PETSC_ARCH=linux-gnu check
The above procedure could be improved for performance by linking against ESSL and using the IBM XL compilers. One could also use OpenBLAS.
Here is another procedure for Traverse using the XL compiler:
$ git clone -b maint https://gitlab.com/petsc/petsc.git petsc $ cd petsc $ module load openmpi/devtoolset-8/4.0.3rc1/64 rh/devtoolset/8 $ OPTFLAGS="-Ofast -qarch=pwr9 -qtune=pwr9 -DNDEBUG" $ ./configure --with-cc=xlc --with-cxx=xlC --with-fc=xlf COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" --with-debugging=0 --with-scalar-type=real --with-batch=1 --with-cuda=0 --with-mpi=1 --download-fblaslapack --download-scalapack --download-mumps --with-mpi-include=/usr/local/openmpi/4.0.3rc1/devtoolset-8/ppc64le/include --with-mpi-lib=\[/usr/local/openmpi/4.0.3rc1/devtoolset-8/ppc64le/lib64/libmpi.so,/usr/local/openmpi/4.0.3rc1/devtoolset-8/ppc64le/lib64/libmpi_mpifh.so\] $ make PETSC_DIR=/home/$USER/petsc PETSC_ARCH=arch-linux2-c-opt all $ make PETSC_DIR=/home/$USER/petsc PETSC_ARCH=arch-linux2-c-opt check
Large Indices
In some cases you may need to build a version with 64-bit integers. The following builds PETSc against the 64-bit Intel MKL BLAS/LAPACK with multithreading on TigerCPU:
module load intel/19.0/64/19.0.5.281 intel-mpi/intel/2019.5/64 rh/devtoolset/6 OPTFLAGS="-Ofast -xHost -mtune=skylake-avx512 -DNDEBUG" git clone -b maint https://gitlab.com/petsc/petsc.git petsc cd petsc ./configure PETSC_ARCH=arch-linux2-64 --with-debugging=0 COPTFLAGS='-O3 -xHost -DMKL_ILP64' CXXOPTFLAGS='-O3 -xHost -DMKL_ILP64' --with-blaslapack-include="${MKLROOT}/include" --with-blaslapack-lib="-L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl"
If you encounter an error like "TESTING: configureMPIEXEC from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:174)" or "Runaway process exceeded time limit" then try running this command before running the configure script: unset I_MPI_HYDRA_BOOTSTRAP. In some cases these errors can be addressed by adding --with-mpi="srun -N 1 -n 1 -t 1" or --with-batch to the configure line.
CUDA
Below is an example of building PETSc with CUDA:
wget http://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-3.13.1.tar.gz tar zxf petsc-3.13.1.tar.gz cd petsc-3.13.1 module load rh/devtoolset/8 openmpi/gcc/3.1.5/64 cmake/3.x OPTFLAGS="-O3 -march=native" ./configure PETSC_ARCH=arch-gcc-openmpi-cuda-release --with-debugging=0 COPTFLAGS="$OPTFLAGS" CXXOPTFLAGS="$OPTFLAGS" FOPTFLAGS="$OPTFLAGS" CUDAOPTFLAGS="-O3 --use_fast_math -arch=sm_60" --with-scalar-type=complex --with-fortran-kernels=1 --with-fortran-interface=1 --with-cuda=1 --download-slepc=yes make PETSC_DIR=/home/$USER/software/petsc-3.13.1 PETSC_ARCH=arch-gcc-openmpi-cuda-release all
Running make check will fail because there is no GPU on the head node:
make PETSC_DIR=/home/$USER/software/petsc-3.13.1 PETSC_ARCH=arch-gcc-openmpi-cuda-release check
Note that you must rewrite parts of your code in order to use a GPU.
Additional notes
For some builds you will need to run the PETSc configure script and then modify the makefiles for your purposes and then run make all. This approach has proven successful for building a multi-threaded version of MUMPS.
See tips on linking against the Intel MKL and the URL to the Link Line Advisor on that page.
If you encounter any difficulties with PETSc then please send an email to cses@princeton.edu or attend a help session.