Performance profiler for parallel and GPU codes Arm MAP is a graphical and command-line profiler for serial,multithreaded, parallel and GPU-enabled applications written in C, C++ and Fortran. It has an easy to use, low-overhead interface. See the documentation for MAP.Follow these steps to use MAP:Connect to the login node of the cluster with X11-forwarding enabled (e.g., ssh -X). You may also consider using TurboVNC.Build your application as you normally would but also turn on the compiler debug symbols. This is typically done by adding the -g option to the icc, gcc, mpicc, ifort, etc., command. This enables source-level profiling. It is recommended to use release build optimization flags (e.g., -O3, -xHost, -march=native). This way efforts can be spent optimizing regions not addressed by compiler optimizations.Latest VersionThe latest version of MAP can be made available by running this command:module load map/24.0Tiger, Della, Adroit and StellarMAP and DDT are part of the Arm Forge package. To see the available versions, use this command: module avail map. To load the latest module run: module load map/24.0Non-MPI jobs (serial or OpenMP)Prepare your Slurm script as you normally would. That is, request the appropriate resources for the job (nodes, tasks, CPUs, walltime, etc). The addition of MAP should have a negligible impact on the wall clock time.Precede your executable with the map executable along with the flag --profile. For example, if your executable is a.out and you need to give it the command-line argument input.file: "/usr/licensed/bin/map --profile ./a.out input.file" MPI jobs (including hybrid MPI/OpenMP)See a demo for LAMMPSPrior to submitting your Slurm script, load the necessary MPI modules, then run the following script once: /usr/licensed/ddt/ddt18.0.2/rhel7/x86_64/map/wrapper/build_wrapper. This will create a wrapper library and some symbolic links in $HOME/.allinea/wrapper.In your slurm submission script add the following with your newly created .so file: export ALLINEA_MPI_WRAPPER=$HOME/.allinea/wrapper/libmap-sampler-pmpi-<machine-name>.princeton.edu.so. Where <machine-name> is the name of the head node, for example: tigercpu, della5.Precede your executable with the map executable along with the flag --profile. For example, if your executable is a.out and you need to give it the command line argument "input.file" then use: /usr/licensed/bin/map --profile ./a.out input.fileOnce the job is complete, a .map file will be created in your working directory. Start the MAP GUI on the head node: /usr/licensed/bin/map. Then select "Load Profile Data File" and choose the new .map file.Below is a sample Slurm script for an MPI code that uses a GPU:#!/bin/bash #SBATCH --job-name=myjob # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=4 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G per cpu-core is default) #SBATCH --gres=gpu:1 # number of gpus per node #SBATCH --time=00:02:00 # total run time limit (HH:MM:SS) module purge module load intel/18.0/64/18.0.3.222 module load intel-mpi/intel/2018.3/64 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK export ALLINEA_MPI_WRAPPER=$HOME/.allinea/wrapper/libmap-sampler-pmpi-tigergpu.princeton.edu.so export ALLINEA_LICENSE_FILE=/usr/licensed/ddt/ddt18.0.2/rhel7/x86_64/Licence.default /usr/licensed/bin/map --profile srun $HOME/.local/bin/lmp_tigerGpu -sf gpu -sf intel -sf omp -in in.melt.gpuTraverseHere is an example for a specific code that uses MPI and GPUs:$ ssh -X <YourNetID>@traverse.princeton.edu $ module load map/20.0.1 openmpi/gcc/3.1.4/64 cudatoolkit/10.2 $ export MPICC=$(which mpicc) $ mapOnce the GUI opens click on "Profile". A window with the title "Run (on traverse.princeton.edu)" will appear. Fill in the needed information and then click on "Run". Your code will run and then the profiling information will appear. Choose "Stop and Analyze" if the code is running for too long.GPU CodesAccording the MAP user guide, when compiling CUDA kernels do not generate debug information for device code (the -G or --device-debug flag) as this can significantly impair runtime performance. Use -lineinfo instead, for example:nvcc device.cu -c -o device.o -g -lineinfo -O3There are No Compilers on the Compute Nodes$ module load openmpi/gcc/4.1.2 $ mpicxx --showme g++ -I/usr/local/openmpi/4.1.2/gcc/include -pthread -L/usr/local/openmpi/4.1.2/gcc/lib64 -L/usr/lib64 -Wl,-rpath -Wl,/usr/local/openmpi/4.1.2/gcc/lib64 -Wl,-rpath -Wl,/usr/lib64 -Wl,--enable-new-dtags -lmpi_cxx -lmpi $ ompi_info Package: Open MPI mockbuild@42bbe3ce599c42a79674836eda18c320 Distribution Open MPI: 4.1.2 Open MPI repo revision: v4.1.2 Open MPI release date: Nov 24, 2021 Open RTE: 4.1.2 Open RTE repo revision: v4.1.2 Open RTE release date: Nov 24, 2021 OPAL: 4.1.2 OPAL repo revision: v4.1.2 OPAL release date: Nov 24, 2021 MPI API: 3.1.0 Ident string: 4.1.2 Prefix: /usr/local/openmpi/4.1.2/gcc Configured architecture: x86_64-redhat-linux-gnu Configure host: 42bbe3ce599c42a79674836eda18c320 Configured by: mockbuild Configured on: Thu Mar 3 17:50:13 UTC 2022 Configure host: 42bbe3ce599c42a79674836eda18c320 Configure command line: '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr/local/openmpi/4.1.2/gcc' '--exec-prefix=/usr/local/openmpi/4.1.2/gcc' '--bindir=/usr/local/openmpi/4.1.2/gcc/bin' '--sbindir=/usr/local/openmpi/4.1.2/gcc/sbin' '--sysconfdir=/etc' '--datadir=/usr/local/openmpi/4.1.2/gcc/share' '--includedir=/usr/local/openmpi/4.1.2/gcc/include' '--libdir=/usr/local/openmpi/4.1.2/gcc/lib64' '--libexecdir=/usr/local/openmpi/4.1.2/gcc/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/local/openmpi/4.1.2/gcc/man' '--infodir=/usr/local/openmpi/4.1.2/gcc/share/info' '--disable-static' '--enable-shared' '--with-sge' '--enable-mpi_thread_multiple' '--enable-mpi-cxx' '--with-cma' '--sysconfdir=/etc/openmpi/4.1.2/gcc' '--with-esmtp' '--with-slurm' '--with-pmix=/usr' '--with-libevent=/usr' '--with-libevent-libdir=/usr/lib64' '--with-hwloc=/usr' '--with-ucx=/usr' '--with-hcoll=/opt/mellanox/hcoll' '--without-verbs' 'LDFLAGS=-Wl,-rpath,/usr/local/openmpi/4.1.2/gcc/lib64 -Wl,-z,noexecstack' Built by: mockbuild Built on: Thu Mar 3 18:01:18 UTC 2022 Built host: 42bbe3ce599c42a79674836eda18c320 C bindings: yes C++ bindings: yes Fort mpif.h: yes (all) Fort use mpi: yes (full: ignore TKR) Fort use mpi size: deprecated-ompi-info-value Fort use mpi_f08: yes Fort mpi_f08 compliance: The mpi_f08 module is available, but due to limitations in the gfortran compiler and/or Open MPI, does not support the following: array subsections, direct passthru (where possible) to underlying Open MPI's C functionality Fort mpi_f08 subarrays: no Java bindings: no Wrapper compiler rpath: runpath C compiler: gcc C compiler absolute: /usr/bin/gcc C compiler family name: GNU C compiler version: 8.5.0 C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fort compiler: gfortran Fort compiler abs: /usr/bin/gfortran Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::) Fort 08 assumed shape: yes Fort optional args: yes Fort INTERFACE: yes Fort ISO_FORTRAN_ENV: yes Fort STORAGE_SIZE: yes Fort BIND(C) (all): yes Fort ISO_C_BINDING: yes Fort SUBROUTINE BIND(C): yes Fort TYPE,BIND(C): yes Fort T,BIND(C,name="a"): yes Fort PRIVATE: yes Fort PROTECTED: yes Fort ABSTRACT: yes Fort ASYNCHRONOUS: yes Fort PROCEDURE: yes Fort USE...ONLY: yes Fort C_FUNLOC: yes Fort f08 using wrappers: yes Fort MPI_SIZEOF: yes C profiling: yes C++ profiling: yes Fort mpif.h profiling: yes Fort use mpi profiling: yes Fort use mpi_f08 prof: yes C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes) Sparse Groups: no Internal debug support: no MPI interface warnings: yes MPI parameter check: runtime Memory profiling support: no Memory debugging support: no dl support: yes Heterogeneous support: no mpirun default --prefix: no MPI_WTIME support: native Symbol vis. support: yes Host topology support: yes IPv6 support: no MPI1 compatibility: no MPI extensions: affinity, cuda, pcollreq FT Checkpoint support: no (checkpoint thread: no) C/R Enabled Debugging: no MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 MPI_MAX_DATAREP_STRING: 128 MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.1.2) MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.1.2) MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.1.2) MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA event: external (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA hwloc: external (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA pmix: ext3x (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.1.2) MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA schizo: jsm (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: adapt (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: han (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: hcoll (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA op: avx (MCA v2.1.0, API v1.0.0, Component v4.1.2) MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.1.2) MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.1.2) MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component v4.1.2) MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.1.2) MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component v4.1.2) The Slurm script is:#!/bin/bash #SBATCH --job-name=cxx_mpi # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks-per-node=4 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=1G # memory per cpu-core (4G is default) #SBATCH --time=00:20:10 # total run time limit (HH:MM:SS) module purge module load map/24.0 module load openmpi/gcc/4.1.2 map --profile srun ./hello_world_mpi When the code is ran:getsebool: SELinux is disabled Warning: unrecognised style "CDE" Linaro Forge 24.0.2 - Linaro MAP MAP: Unable to automatically generate and compile a MPI wrapper for your system. Please start Linaro Forge with the MPICC environment variable set to the C MPI compiler for the MPI version in use with your program. MAP: MAP: /usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 433: [: argument expected MAP: No mpicc command found (tried mpixlc_r mpxlc_r mpixlc mpxlc mpiicc mpcc mpicc mpigcc mpgcc mpc_cc) MAP: MAP: Unable to compile MPI wrapper library (needed by the Linaro Forge sampler). Please set the environment variable MPICC to your MPI compiler command and try again. Or with Intel MPI:Warning: unrecognised style "CDE" Linaro Forge 24.0.2 - Linaro MAP MAP: Unable to automatically generate and compile a MPI wrapper for your system. Please start Linaro Forge with the MPICC environment variable set to the C MPI compiler for the MPI version in use with your program. MAP: MAP: Attempting to generate MPI wrapper using $MPICC ('/opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc').../usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 237: /opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc: No such file or directory MAP: MAP: /bin/sh: /opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc: No such file or directory MAP: Error: Couldn't run '/opt/intel/oneapi/mpi/2021.7.0/bin/mpiicc -E /tmp/tmpl9wuc5yw.c' for parsing mpi.h. MAP: Process exited with code 127. MAP: fail MAP: /usr/licensed/linaro/forge/24.0.2/map/wrapper/build_wrapper: line 433: [: argument expected MAP: MAP: Unable to compile MPI wrapper library (needed by the Linaro Forge sampler). Please set the environment variable MPICC to your MPI compiler command and try again.When running on the head node:Warning: unrecognised style "CDE" Linaro Forge 24.0.2 - Linaro MAP MAP: (Message repeated 2 times.) MAP: MAP: No debug symbols were loaded for the glibc library. MAP: It is recommended you install the glibc debug symbols. Profiling : mpirun /home/aturing/.local/bin/lmp_intel -sf intel -in in.melt Linaro Forge sampler : preload (Express Launch) MPI implementation : Auto-Detect (Intel MPI (MPMD)) * number of processes : 32 * number of nodes : 1 * MPI wrapper : preload (precompiled mpich3-gnu-64) (Express Launch) Linaro Forge sampler: CUPTI failed to enable kernel activity monitoring - error code 15 Linaro Forge sampler: CUPTI failed to enable kernel activity monitoring - error code 15 … Linaro Forge sampler: CUPTI failed to enable memcpy activity monitoring - error code 15 MAP: Processes 0-31: MAP: MAP: The Linaro Forge sampler failed to initialize. MAP: MAP: [Detaching after vfork from child process 3727988] MAP: [Detaching after vfork from child process 3728168] MAP: [Detaching after vfork from child process 3728256] MAP: [Detaching after vfork from child process 3728577] MAP: 0 Nobel and TigressdataOn a cluster (Tiger, Della, Adroit, Stellar) use this option only if your job will likely schedule and complete quickly, as you will have to wait for it to finish before you can analyze any results. The MAP GUI will build and submit your job to the scheduler for you. If the job will not run quickly it is best follow the directions below to use the scheduler manually.Start MAP: “/usr/licensed/bin/map”If this is the first time you are running MAP on a given machine, MAP will be configured with the default values for that system. You may need to change these for your application, as described below.In the opening window select the “Profile” button.Select your Application, Arguments, Input File, and Working Directory as appropriate. If this is an MPI code, check the MPI box.Adjust the Number of Processes (total number of process for the entire job), the number of nodes, and the number of processes per node. The number of processes should equal the number of nodes multiplied by the number of processes per node. For Tiger, Della, Adroit, and Stellar the implementation should be "SLURM (generic)". Typically there is no need to change the implementation nor is there a need for any srun arguments. If the implementation is something else, click change, then select SLURM (generic) from the drop down menu MPI/UPC Implementation. Then click OK.For Nobel and Tigressdata the implementation should be "OpenMPI". If the implementation is something else, click change, then select OpenMPI from the drop down menu MPI/UPC Implementation. Then click OK..If this is an OpenMP job, check the OpenMP boxAdjust the Number of OpenMP threads as appropriate.OpenMP applications require an additional change. Click on the Options button (near the bottom left). This will open another window. Select "Job Submission" from the left hand menu. Then change the "Submission template file:" field to "/usr/licensed/ddt/templates/slurm-openmp.qtf". This should be re-set to slurm-default.qtf for all non-OpenMP applications. If running on a cluster check "Submit to Queue", and click on Parameters.Choose a Wall Clock Limit. The addition of MAP should have a negligible impact on the wall clock time of your application.If you wish an Email notification at the beginning (begin) or end (end) of a job, or when the job aborts (fail), change the default to suit your preference (e.g., all).There is no need to change the Email address unless you do not have a Princeton address. If you don't, please specify your email address here.Click OK.On Nobel and Tigressdata, click on Run; otherwise, click on Submit.MAP will submit the job to the scheduler (on a cluster) and run when ready. Profiling statistics will not be available until the job is finished. Click the “Stop and Analyze” button at the top right to end the job immediately.