Performance profiler for parallel and GPU codes

Linaro MAP is a graphical and command-line profiler for serial, multithreaded, parallel and GPU-enabled applications written in C, C++ and Fortran. It also works with Python. MAP has an easy to use, low-overhead interface. See the documentation for MAP.

Follow these steps to use MAP:

  1. Create a “Graphical Desktop” using Open OnDemand.
  2. Build your application as you normally would but also turn on the compiler debug symbols. This is typically done by adding the -g option to the icc, gcc, mpicc, ifort, etc., command. This enables source-level profiling. It is recommended to use release build optimization flags (e.g., -O3, -xHost, -march=native). This way efforts can be spent optimizing regions not addressed by compiler optimizations.

Tiger, Della, Adroit  and Stellar

To see the available versions, run this command:

$ module avail map

To load a MAP module:

module load map/24.1

Non-MPI jobs (serial or OpenMP)

  • Prepare your Slurm script as you normally would. That is, request the appropriate resources for the job (nodes, tasks, CPUs, walltime, etc). The addition of MAP should have a negligible impact on the wall clock time.
  • Precede your executable with the map executable along with the flag --profile. For example, if your executable is a.out and you need to give it the command-line argument input.file:

    map --profile ./a.out input.file

MPI jobs (including hybrid MPI/OpenMP)

Before profiling the code, you need to either choose or generate a MPI wrapper library.

To use a wrapper library that already exists, set the FORGE_MPI_WRAPPER environment variable to the path of a precompiled wrapper file found in <MAP-installation-directory>/map/wrapper/precompiled before running. For instance, for Open MPI 4, we would recommend using the openmpi40-gnu-64 wrapper:

export FORGE_MPI_WRAPPER=<MAP-installation-directory>/map/wrapper/precompiled/openmpi40-gnu-64/wrapper/libmap-sampler-pmpi-precompiled.so.1

Alternatively, you can manually compile a custom MPI wrapper library on the login node, and specify this library via the FORGE_MPI_WRAPPER environment variable. To do so, set MPICC and create a new directory then run

<MAP-installation-directory>/bin/make-profiler-libraries

 Here are the steps:

export MPICC=/usr/local/openmpi/4.1.2/gcc/bin/mpicc
mkdir ~/precompiler-wrapper-openmpi-4-gcc cd ~/precompiler-wrapper-openmpi-4-gcc make-profiler-libraries

This will generate the wrapper library libmap_sampler_pmpi.so with symlinks. Then you can set the FORGE_MPI_WRAPPER variable before running:

export FORGE_MPI_WRAPPER=~/precompiler-wrapper-openmpi-4-gcc/libmap-sampler-pmpi.so.1

Here is an example for Intel MPI with the intel-mpi/intel/2021.7.0 module:

export MPICC="/opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc -cc=icx"
make-profiler-libraries

Note: These libraries must be on the same NFS/GPFS filesystem as your program.

Below is a sample Slurm script for an Open MPI code:

#!/bin/bash
#SBATCH --job-name=cxx_mpi       # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks-per-node=4      # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G is default)
#SBATCH --time=00:10:00          # total run time limit (HH:MM:SS)
module purge module load openmpi/gcc/4.1.2 module load map/24.1
export FORGE_MPI_WRAPPER=/usr/licensed/linaro/forge/24.1/map/wrapper/precompiled/openmpi40-gnu-64/wrapper/libmap-sampler-pmpi-precompiled.so.1
map --profile srun ./hello_world_mpi

Using the GUI

Here is an example for a specific code that uses MPI and GPUs:

$ ssh -X <YourNetID>@della.princeton.edu  # or use graphical desktop via OnDemand
$ module load openmpi/gcc/4.1.2 load map/24.1
$ map

Instead of "ssh -X", one can use Open OnDemand. Once the GUI opens click on "Profile". A window with the title "Run" will appear. Fill in the needed information and then click on "Run". Your code will run and then the profiling information will appear. Choose "Stop and Analyze" if the code is running for too long.

GPU Codes

According the MAP user guide, when compiling CUDA kernels do not generate debug information for device code (the -G or --device-debug flag) as this can significantly impair runtime performance. Use -lineinfo instead, for example:

nvcc device.cu -c -o device.o -g -lineinfo -O3

Thread Affinity Advisor

MAP provides an advisor as part of the GUI that can provide value information about thread affinities. For instance, it can point out if multiple threads are assigned to the same CPU core. There are many factors that can affect thread affinities such as the MPI runtime, OpenMP runtime, and job scheduler. One can modify the thread affinities by using environment variables such as OMP_PLACES and the Slurm sbatch option --cpu-bind.

There are No Compilers on the Compute Nodes

If you fail to use a precompiled wrapper library for an MPI code, you will encounter:

getsebool:  SELinux is disabled
Warning: unrecognised style "CDE"
Linaro Forge 24.0.2 - Linaro MAP
MAP: Unable to automatically generate and compile a MPI wrapper for your system. Please start Linaro Forge with the MPICC environment variable set to the C MPI compiler for the MPI version in use with your program.
MAP: 
MAP: /usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 433: [: argument expected
MAP: No mpicc command found (tried mpixlc_r mpxlc_r mpixlc mpxlc mpiicc mpcc mpicc mpigcc mpgcc mpc_cc)
MAP: 
MAP: Unable to compile MPI wrapper library (needed by the Linaro Forge sampler). Please set the environment variable MPICC to your MPI compiler command and try again.

Or with Intel MPI:

Warning: unrecognised style "CDE"
Linaro Forge 24.0.2 - Linaro MAP
MAP: Unable to automatically generate and compile a MPI wrapper for your system. Please start Linaro Forge with the MPICC environment variable set to the C MPI compiler for the MPI version in use with your program.
MAP: 
MAP: Attempting to generate MPI wrapper using $MPICC ('/opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc').../usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 237: /opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc: No such file or directory
MAP: 
MAP: /bin/sh: /opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc: No such file or directory
MAP: Error: Couldn't run '/opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc -E /tmp/tmpl9wuc5yw.c' for parsing mpi.h.
MAP:        Process exited with code 127.
MAP: fail
MAP: /usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 433: [: argument expected
MAP: 
MAP: Unable to compile MPI wrapper library (needed by the Linaro Forge sampler). Please set the environment variable MPICC to your MPI compiler command and try again.