Quantum Espresso

Quantum Espresso is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves and pseudopotentials.

GPU Version

The NVIDIA GPU Cloud (NGC) hosts a Quantum Espresso container that is produced by SISSA. The container has been created to run on the A100, V100 and P100 GPUs of della-gpu, adroit and tigergpu. It also provides optimizations from CUDA-aware MPI. Singularity must be used on our clusters when working with containers as illustrated below.

Below are a set of sample commands showing how to run the AUSURF112 benchmark:

$ ssh <YourNetID>@della-gpu.princeton.edu
$ mkdir -p software/quantum_espresso  # or another location
$ cd software/quantum_espresso
$ singularity pull docker://nvcr.io/hpc/quantum_espresso:v6.7  # check for newer version on NGC
$ cd /scratch/gpfs/<YourNetID>
$ mkdir qe_test && cd qe_test
$ wget https://repository.prace-ri.eu/git/UEABS/ueabs/-/raw/master/quantum_espresso/test_cases/small/Au.pbe-nd-van.UPF
$ wget https://repository.prace-ri.eu/git/UEABS/ueabs/-/raw/master/quantum_espresso/test_cases/small/ausurf.in

Below is a sample Slurm script (job.slurm):

#!/bin/bash
#SBATCH --job-name=qe-test       # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks-per-node=8      # number of tasks per node
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=32G                # total memory per node
#SBATCH --gres=gpu:2             # number of gpus per node
#SBATCH --time=00:15:00          # total run time limit (HH:MM:SS)
#SBATCH --gpu-mps                # enable cuda multi-process service
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun --mpi=pmi2 \
singularity run --nv \
     $HOME/software/quantum_espresso/quantum_espresso_v6.7.sif \
     pw.x -input ausurf.in -npool 2

Note that CUDA Multi-Process Service is only available on della-gpu and adroit.

Submit the job:

$ sbatch job.slurm

The following benchmark data was generated for the case above on della-gpu in June 2021:

nodes ntasks-per-node cpus-per-task GPUs execution time (s)
1 4 1 2 228
1 8 1 2 184
1 8 2 2 164
1 16 1 2 175
1 8 4 2 156
2 16 1 4 140

The following was generated on tigergpu:

nodes ntasks-per-node cpus-per-task GPUs execution time (s)
1 8 1 4 344
1 16 1 4 353

The AUSURF112 benchmark requires a lot of GPU memory. A single GPU does not provide enough memory to run the benchmark. The code was found to fail for large values of ntasks-per-node.

 

CPU Version

The directions below can be used to build QE on Della for the CPU nodes. Users may need to modifiy the directions to build a custom version of the software.

$ ssh <YourNetID>@della.princeton.edu
$ mkdir -p software && cd software
$ module load git/2.18
$ git clone https://github.com/QEF/q-e.git
$ git checkout qe-6.7.0
$ cd qe-6.7.0
$ mkdir build && cd build
$ module load rh/devtoolset/9 openmpi/gcc/2.0.2/64 intel-mkl/2018.3/3/64
# copy and paste the next three lines
$ cmake3 -DCMAKE_BUILD_TYPE=Release -DCMAKE_Fortran_COMPILER=mpif90 -DCMAKE_C_COMPILER=mpicc \
-DCMAKE_INSTALL_PREFIX=$HOME/.local -DCMAKE_C_FLAGS_RELEASE="-Ofast -march=native -DNDEBUG" \
-DCMAKE_Fortran_FLAGS_RELEASE="-Ofast -march=native -DNDEBUG" ..
$ make -j 8
$ make install

The resulting executables will be available in ~/.local/bin. To enable OpenMP add the following to the cmake3 line above:

-DQE_ENABLE_OPENMP=yes

It is normal to see the following line when compiling:

remark #15009: *** has been targeted for automatic cpu dispatch

The above arises because the code is being built to run optimally on multiple Intel CPU generations.

Below is a sample Slurm script:

#!/bin/bash
#SBATCH --job-name=qe-cpu        # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks-per-node=8      # number of tasks per node
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G is default)
#SBATCH --time=00:15:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load openmpi/gcc/2.0.2/64
module load intel-mkl/2018.3/3/64
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun $HOME/.local/bin/pw.x -input ausurf.in -npool 2 

Make sure you perform a scaling analysis to find the optimal number of nodes and CPU-cores to use.

The directions above are for Della. If you would like directions for another cluster then please write to cses@princeton.edu.