Performance profiler for parallel and GPU codes Linaro MAP is a graphical and command-line profiler for serial, multithreaded, parallel and GPU-enabled applications written in C, C++ and Fortran. It also works with Python. MAP has an easy to use, low-overhead interface. See the documentation for MAP.Follow these steps to use MAP:Create a “Graphical Desktop” using Open OnDemand.Build your application as you normally would but also turn on the compiler debug symbols. This is typically done by adding the -g option to the icc, gcc, mpicc, ifort, etc., command. This enables source-level profiling. It is recommended to use release build optimization flags (e.g., -O3, -xHost, -march=native). This way efforts can be spent optimizing regions not addressed by compiler optimizations.Tiger, Della, Adroit and StellarTo see the available versions, run this command:$ module avail mapTo load a MAP module:module load map/24.1Non-MPI jobs (serial or OpenMP)Prepare your Slurm script as you normally would. That is, request the appropriate resources for the job (nodes, tasks, CPUs, walltime, etc). The addition of MAP should have a negligible impact on the wall clock time.Precede your executable with the map executable along with the flag --profile. For example, if your executable is a.out and you need to give it the command-line argument input.file:map --profile ./a.out input.fileMPI jobs (including hybrid MPI/OpenMP)Before profiling the code, you need to either choose or generate a MPI wrapper library.To use a wrapper library that already exists, set the FORGE_MPI_WRAPPER environment variable to the path of a precompiled wrapper file found in <MAP-installation-directory>/map/wrapper/precompiled before running. For instance, for Open MPI 4, we would recommend using the openmpi40-gnu-64 wrapper:export FORGE_MPI_WRAPPER=<MAP-installation-directory>/map/wrapper/precompiled/openmpi40-gnu-64/wrapper/libmap-sampler-pmpi-precompiled.so.1Alternatively, you can manually compile a custom MPI wrapper library on the login node, and specify this library via the FORGE_MPI_WRAPPER environment variable. To do so, set MPICC and create a new directory then run<MAP-installation-directory>/bin/make-profiler-libraries Here are the steps:export MPICC=/usr/local/openmpi/4.1.2/gcc/bin/mpiccmkdir ~/precompiler-wrapper-openmpi-4-gcc cd ~/precompiler-wrapper-openmpi-4-gcc make-profiler-librariesThis will generate the wrapper library libmap_sampler_pmpi.so with symlinks. Then you can set the FORGE_MPI_WRAPPER variable before running:export FORGE_MPI_WRAPPER=~/precompiler-wrapper-openmpi-4-gcc/libmap-sampler-pmpi.so.1Here is an example for Intel MPI with the intel-mpi/intel/2021.7.0 module:export MPICC="/opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc -cc=icx" make-profiler-librariesNote: These libraries must be on the same NFS/GPFS filesystem as your program.Below is a sample Slurm script for an Open MPI code:#!/bin/bash #SBATCH --job-name=cxx_mpi # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks-per-node=4 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G is default) #SBATCH --time=00:10:00 # total run time limit (HH:MM:SS) module purge module load openmpi/gcc/4.1.2 module load map/24.1 export FORGE_MPI_WRAPPER=/usr/licensed/linaro/forge/24.1/map/wrapper/precompiled/openmpi40-gnu-64/wrapper/libmap-sampler-pmpi-precompiled.so.1 map --profile srun ./hello_world_mpiUsing the GUIHere is an example for a specific code that uses MPI and GPUs:$ ssh -X <YourNetID>@della.princeton.edu # or use graphical desktop via OnDemand $ module load openmpi/gcc/4.1.2 load map/24.1 $ mapInstead of "ssh -X", one can use Open OnDemand. Once the GUI opens click on "Profile". A window with the title "Run" will appear. Fill in the needed information and then click on "Run". Your code will run and then the profiling information will appear. Choose "Stop and Analyze" if the code is running for too long.GPU CodesAccording the MAP user guide, when compiling CUDA kernels do not generate debug information for device code (the -G or --device-debug flag) as this can significantly impair runtime performance. Use -lineinfo instead, for example:nvcc device.cu -c -o device.o -g -lineinfo -O3Thread Affinity AdvisorMAP provides an advisor as part of the GUI that can provide value information about thread affinities. For instance, it can point out if multiple threads are assigned to the same CPU core. There are many factors that can affect thread affinities such as the MPI runtime, OpenMP runtime, and job scheduler. One can modify the thread affinities by using environment variables such as OMP_PLACES and the Slurm sbatch option --cpu-bind.There are No Compilers on the Compute NodesIf you fail to use a precompiled wrapper library for an MPI code, you will encounter:getsebool: SELinux is disabled Warning: unrecognised style "CDE" Linaro Forge 24.0.2 - Linaro MAP MAP: Unable to automatically generate and compile a MPI wrapper for your system. Please start Linaro Forge with the MPICC environment variable set to the C MPI compiler for the MPI version in use with your program. MAP: MAP: /usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 433: [: argument expected MAP: No mpicc command found (tried mpixlc_r mpxlc_r mpixlc mpxlc mpiicc mpcc mpicc mpigcc mpgcc mpc_cc) MAP: MAP: Unable to compile MPI wrapper library (needed by the Linaro Forge sampler). Please set the environment variable MPICC to your MPI compiler command and try again. Or with Intel MPI:Warning: unrecognised style "CDE" Linaro Forge 24.0.2 - Linaro MAP MAP: Unable to automatically generate and compile a MPI wrapper for your system. Please start Linaro Forge with the MPICC environment variable set to the C MPI compiler for the MPI version in use with your program. MAP: MAP: Attempting to generate MPI wrapper using $MPICC ('/opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc').../usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 237: /opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc: No such file or directory MAP: MAP: /bin/sh: /opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc: No such file or directory MAP: Error: Couldn't run '/opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc -E /tmp/tmpl9wuc5yw.c' for parsing mpi.h. MAP: Process exited with code 127. MAP: fail MAP: /usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 433: [: argument expected MAP: MAP: Unable to compile MPI wrapper library (needed by the Linaro Forge sampler). Please set the environment variable MPICC to your MPI compiler command and try again.