AMD MI210 GPU Testing

Overview

The della-milan node features the AMD EPYC 7763 CPU (128 cores), 1 TB of RAM and 2 AMD MI210 GPUs. The Frontier supercomputer, which is expected to come online in 2022 and be the fastest machine in the US, will feature the MI250X GPU.

 

Connecting

If you have an account on the Della cluster and you have written to cses@princeton.edu for access to della-milan (you must be added to the video group) then you can connect to and use the node:

$ ssh <YourNetID>@della-milan.princeton.edu

The examples below will not work if you are not in the video group.

 

Getting Started

The software stack for AMD GPUs is called ROCm (Radeon Open Compute platforM or Radeon Open ECosystem). There is no environment module for this. You may consider adding the following to your ~/.bashrc file:

export PATH=/opt/rocm-5.1.0/bin:$PATH
export LD_LIBRARY_PATH=/opt/rocm-5.1.0/lib:$LD_LIBRARY_PATH

Here are the contents of the above directory:

$ ls -lL /opt/rocm-5.0.0/bin/
total 226488
-rwxr-xr-x. 1 root root    531616 Feb  1 18:25 amdclang
-rwxr-xr-x. 1 root root    531616 Feb  1 18:25 amdclang++
-rwxr-xr-x. 1 root root    531616 Feb  1 18:25 amdclang-cl
-rwxr-xr-x. 1 root root    531616 Feb  1 18:25 amdclang-cpp
-rwxr-xr-x. 1 root root    531616 Feb  1 18:25 amdflang
-rwxr-xr-x. 1 root root    531616 Feb  1 18:25 amdlld
-rwxr-xr-x. 1 root root     11258 Feb  1 18:43 aompcc
-rwxr-xr-x. 1 root root      2276 Feb  1 18:28 clang-ocl
-rwxr-xr-x. 1 root root       515 Jan 31 22:28 findcode.sh
-rwxr-xr-x. 1 root root       322 Jan 31 22:28 finduncodep.sh
-rwxr-xr-x. 1 root root    177030 Feb  1 18:43 gputable.txt
-rwxrwxr-x. 1 root root     27082 Jan 31 22:28 hipcc
-rwxrwxr-x. 1 root root      1508 Jan 31 22:28 hipcc_cmake_linker_helper
-rwxrwxr-x. 1 root root      8244 Jan 31 22:28 hipconfig
-rwxr-xr-x. 1 root root       713 Jan 31 22:28 hipconvertinplace-perl.sh
-rwxr-xr-x. 1 root root       602 Jan 31 22:28 hipconvertinplace.sh
-rwxrwxr-x. 1 root root      1857 Jan 31 22:28 hipdemangleatp
-rwxrwxr-x. 1 root root      6169 Jan 31 22:28 hip_embed_pch.sh
-rwxr-xr-x. 1 root root       335 Jan 31 22:28 hipexamine-perl.sh
-rwxr-xr-x. 1 root root       485 Jan 31 22:28 hipexamine.sh
-rwxr-xr-x. 1 root root  33454072 Feb  1 18:38 hipify-clang
-rwxr-xr-x. 1 root root    402158 Jan 31 22:28 hipify-perl
-rw-rw-r--. 1 root root      5796 Jan 31 22:28 hipvars.pm
-rwxr-xr-x. 1 root root      9384 Feb  1 18:43 mygpu
-rwxr-xr-x. 1 root root      9384 Feb  1 18:43 mymcpu
drwxr-xr-x. 2 root root        41 Mar  8 12:31 __pycache__
-rwxr-xr-x. 1 root root     18216 Feb  1 19:16 rocfft_rtc_helper
-rwxr-xr-x. 1 root root 194114448 Feb  1 18:38 rocgdb
-r-xr-xr-x. 1 root root      5462 Feb  1 18:30 rocm_agent_enumerator
-r-xr-xr-x. 1 root root     55584 Feb  1 18:30 rocminfo
-rwxrwxr-x. 1 root root    136587 Jan 31 22:28 rocm-smi
-rwxrwxr-x. 1 root root    136587 Jan 31 22:28 rocm_smi.py
-rwxrwxr-x. 1 root root     10047 Jan 31 22:28 roc-obj
-rwxrwxr-x. 1 root root      8248 Jan 31 22:28 roc-obj-extract
-rwxrwxr-x. 1 root root      7042 Jan 31 22:28 roc-obj-ls
-r-xr-xr-x. 1 root root     18716 Jan 31 22:28 rocprof
-rw-r--r--. 1 root root     16907 Jan 31 22:28 rsmiBindings.py

Common Tools

"rocm-smi" is analogous to nvidia-smi. It is the AMD ROCm System Management Interface.

$ rocm-smi


======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp   AvgPwr  SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%  
0    33.0c  33.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%    
1    31.0c  36.0W   800Mhz  1600Mhz  0%   auto  300.0W    0%   0%    
================================================================================
============================= End of ROCm SMI Log ==============================

 

HIP

HIP or "Heterogeneous-Compute Interface for Portability" provides a C++ syntax that is suitable for compiling most code. "hipcc" is the C++ compiler:

$ hipcc --version
HIP version: 5.1.20531-cacfa990
AMD clang version 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.1.0 22114 5cba46feb6af367b1cafaa183ec42dbfb8207b14)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.1.0/llvm/bin

Hello World Example using HIP

$ mkdir test && cd test
$ wget https://raw.githubusercontent.com/ROCm-Developer-Tools/HIP-Examples/master/HIP-Examples-Applications/HelloWorld/Makefile
$ wget https://raw.githubusercontent.com/ROCm-Developer-Tools/HIP-Examples/master/HIP-Examples-Applications/HelloWorld/HelloWorld.cpp
$ make
$ ./HelloWorld

The "make" command can be done explicitly as follows:

$ hipcc HelloWorld.cpp -o HelloWorld
$ ./HelloWorld

Another example:

$ git clone https://github.com/ROCm-Developer-Tools/HIP.git
$ cd HIP/samples/0_Intro/square
$ make
$ ./square.out

Building LAMMPS from Source

#!/bin/bash

VERSION=31Aug2021
wget https://github.com/lammps/lammps/archive/refs/tags/patch_${VERSION}.tar.gz
tar zvxf patch_${VERSION}.tar.gz
cd lammps-patch_${VERSION}
mkdir build && cd build

module purge
export PATH=/opt/rocm-4.3.0/bin:$PATH
export LD_LIBRARY_PATH=/opt/rocm-4.3.0/lib:$LD_LIBRARY_PATH
export HIP_PLATFORM=amd

cmake3 -D BUILD_MPI=no -D BUILD_OMP=yes \
-D PKG_OPENMP=yes -D PKG_MOLECULE=yes -D PKG_RIGID=yes \
-D CMAKE_CXX_COMPILER=hipcc \
-D PKG_GPU=on -D GPU_API=HIP -D HIP_ARCH=gfx908 ../cmake

make -j 16
make install

Useful Links

AMD ROCm: HIP Programming Guide

rocBLAS

 

Software Containers

Containers designed for AMD GPUs are available on the AMD Infinity Hub: https://www.amd.com/en/technologies/infinity-hub (read more). One can also find containers on Docker Hub. Applications include TensorFlow, PyTorch, LAMMPS, GROMACS, NAMD, CP2K and SPECFEM3D.

See our Singularity page and the ROCm section of the user manual for running, for example:

$ singularity pull docker://amdih/tensorflow:rocm4.2-tf2.5-dev
$ singularity run --rocm tensorflow_rocm4.2-tf2.5-dev.sif
bash: /root//.bash_profile: Permission denied
Singularity> ipython
In [1]: from tensorflow.python.client import device_lib
In [2]: print(device_lib.list_local_devices())

To run the MNIST example:

$ git clone https://github.com/PrincetonUniversity/slurm_mnist
$ cd slurm_mnist
$ singularity exec --rocm $HOME/software/tensorflow_rocm4.2-tf2.5-dev.sif python3 download_mnist.py
$ singularity exec --rocm $HOME/software/tensorflow_rocm4.2-tf2.5-dev.sif python3 mnist_classify.py

 

TensorFlow with Conda/pip

The commands below can be used to install TensorFlow for ROCm:

$ module load anaconda3/2020.11
$ conda create --name tf-rocm python=3.8
$ conda activate tf-rocm
$ pip install tensorflow-rocm
$ export LD_LIBRARY_PATH=/opt/rocm-4.3.0/lib:$LD_LIBRARY_PATH
$ python
>>>> import tensorflow as tf