Quantum Espresso is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves and pseudopotentials.
GPU Version
The NVIDIA GPU Cloud (NGC) hosts a Quantum Espresso container that is produced by SISSA. The container has been created to run on the A100, V100 and P100 GPUs of della-gpu, adroit and tigergpu. It also provides optimizations from CUDA-aware MPI. Singularity must be used on our clusters when working with containers as illustrated below.
Below are a set of sample commands showing how to run the AUSURF112 benchmark:
$ ssh <YourNetID>@della-gpu.princeton.edu $ mkdir -p software/quantum_espresso # or another location $ cd software/quantum_espresso $ singularity pull docker://nvcr.io/hpc/quantum_espresso:v6.7 # check for newer version on NGC $ cd /scratch/gpfs/<YourNetID> $ mkdir qe_test && cd qe_test $ wget https://repository.prace-ri.eu/git/UEABS/ueabs/-/raw/master/quantum_espresso/test_cases/small/Au.pbe-nd-van.UPF $ wget https://repository.prace-ri.eu/git/UEABS/ueabs/-/raw/master/quantum_espresso/test_cases/small/ausurf.in
Below is a sample Slurm script (job.slurm):
#!/bin/bash #SBATCH --job-name=qe-test # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks-per-node=8 # number of tasks per node #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem=32G # total memory per node #SBATCH --gres=gpu:2 # number of gpus per node #SBATCH --time=00:15:00 # total run time limit (HH:MM:SS) #SBATCH --gpu-mps # enable cuda multi-process service #SBATCH --mail-type=begin # send email when job begins #SBATCH --mail-type=end # send email when job ends #SBATCH --mail-user=<YourNetID>@princeton.edu module purge export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun --mpi=pmi2 \ singularity run --nv \ $HOME/software/quantum_espresso/quantum_espresso_v6.7.sif \ pw.x -input ausurf.in -npool 2
Note that CUDA Multi-Process Service is only available on della-gpu and adroit.
Submit the job:
$ sbatch job.slurm
The following benchmark data was generated for the case above on Della (GPU) in June 2021:
nodes | ntasks-per-node | cpus-per-task | GPUs | Execution time (s) |
---|---|---|---|---|
1 | 4 | 1 | 2 | 228 |
1 | 8 | 1 | 2 | 184 |
1 | 8 | 2 | 2 | 164 |
1 | 16 | 1 | 2 | 175 |
1 | 8 | 4 | 2 | 156 |
2 | 16 | 1 | 4 | 140 |
The following was generated on TigerGPU:
nodes | ntasks-per-node | cpus-per-task | GPUs | Execution time (s) |
---|---|---|---|---|
1 | 8 | 1 | 4 | 344 |
1 | 16 | 1 | 4 | 353 |
The AUSURF112 benchmark requires a lot of GPU memory. A single GPU does not provide enough memory to run the benchmark. The code was found to fail for large values of ntasks-per-node.
CPU Version
Della
The directions below can be used to build QE on Della for the CPU nodes. Users may need to modifiy the directions to build a custom version of the software.
$ ssh <YourNetID>@della.princeton.edu $ mkdir -p software && cd software $ wget https://github.com/QEF/q-e/releases/download/qe-6.8/qe-6.8-ReleasePack.tgz $ tar zvxf qe-6.8-ReleasePack.tgz $ cd qe-6.8 $ mkdir build && cd build $ module purge $ module load openmpi/gcc/4.1.2 $ module load fftw/gcc/3.3.9 $ OPTFLAGS="-O3 -march=native -DNDEBUG" # copy and paste the next 9 lines $ cmake3 -DCMAKE_INSTALL_PREFIX=$HOME/.local \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_Fortran_COMPILER=mpif90 \ -DCMAKE_Fortran_FLAGS_RELEASE="$OPTFLAGS" \ -DCMAKE_C_COMPILER=mpicc \ -DCMAKE_C_FLAGS_RELEASE="$OPTFLAGS" \ -DQE_ENABLE_OPENMP=ON \ -DQE_FFTW_VENDOR=FFTW3 \ -DBLA_VENDOR=OpenBLAS .. $ make $ make install
The resulting executables will be available in ~/.local/bin.
Below is a sample Slurm script:
#!/bin/bash #SBATCH --job-name=qe-cpu # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks-per-node=8 # number of tasks per node #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem=64G # total memory per node (4G per cpu-core is default) #SBATCH --time=00:15:00 # total run time limit (HH:MM:SS) #SBATCH --mail-type=begin # send email when job begins #SBATCH --mail-type=end # send email when job ends #SBATCH --mail-user=<YourNetID>@princeton.edu #SBATCH --constraint=skylake # exclude broadwell nodes module purge module load openmpi/gcc/4.1.2 module load fftw/gcc/3.3.9 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun $HOME/.local/bin/pw.x -input ausurf.in -npool 2
Make sure you perform a scaling analysis to find the optimal number of nodes, ntasks-per-node and cpus-per-task. Also, what value should be used for npool?
The following was generated on Della in March 2022:
nodes | ntasks-per-node | cpus-per-task | Execution time (s) | CPU efficiency |
---|---|---|---|---|
1 | 32 | 1 | 1019 | 96% |
1 | 8 | 4 | 1595 | 38% |
The CPU efficiency was obtained from the "jobstats" command.
TigerCPU
Run the commands below to install version 7.0:
$ ssh <YourNetID>@tigercpu.princeton.edu $ mkdir -p software && cd software # or another location $ wget https://github.com/QEF/q-e/archive/refs/tags/qe-7.0.tar.gz $ tar zvxf qe-7.0.tar.gz $ cd q-e-qe-7.0 $ module purge $ module load rh/devtoolset/9 $ module load fftw/gcc/3.3.4 $ module load openmpi/gcc/3.1.5/64 $ OPTFLAGS="-O3 -march=native -DNDEBUG" $ ./configure FFLAGS="$OPTFLAGS" CFLAGS="$OPTFLAGS" --prefix=$HOME/.local --enable-parallel $ make pw $ make install
Do not load the rh/devtoolset/9 module in your Slurm script.