Profiling MPI code with Intel Trace Analyzer and Collector

Intel Trace Analyzer and Collector (ITAC) is a graphical tool for profiling and understanding the MPI behavior of your code.  ITAC allows you to visualize the performance of MPI communication and identify hotspots and reasons for poor scaling performance.  

Initial Setup

  1. Set up your environment to include ITAC (“module load intel-tac”).  Currently ITAC only supports Intel MPI, so your code must be compiled with it, not openmpi.  The Intel MPI compilers are available via the module command (“module load intel-mpi/intel”).  
  2. Compile your MPI application as you normally would with two additional flags, -g and -trace. The first, -g, turns on the compiler debug symbols which will allow source code-level profiling.  The second, -trace, turns on the ITAC trace collectors.  It is recommended to use release build optimization flags (e.g. -O3, -xhost).  This way efforts can be spent optimizing regions not addressed by compiler optimizations.

Submitting/Running a Job

  1. Construct your slurm submission script as you normally would.  Request the desired nodes, tasks, and walltime.  Make sure to include the same modules as you used to build the code, such as intel-tac and intel-mpi/intel.
  2. To enable source code profiling add: “export VT_PCTRACE=1” to the script.
  3. Run the code as you would normally.  Ex. “srun ./my_mpi.out”.
  4. Note that ITAC samples very frequently and can generate huge trace files.  It may be wise to write to a scratch directory and/or start with a short run.  Another possible solution would be to use the ITAC API to only collect for a portion of the code.  See the Intel documentation for more instructions.

Analyzing Results

  1. After the collector has completed, a series of <executable>.stf.* files will be created in the working directory.  
  2. Open the file <executable>.stf with “traceanalyzer <executable>.stf”.  This will open a summary page showing the breakdown of MPI calls vs other code and the top 5 MPI hotspots.
  3. Click the continue button at the upper right to move to more detailed view.
  4. There are a number of useful timeline views in the charts menu.  Ex. Charts -> Event Timeline will show you the timeline of each process and map each MPI call between processes.
  5. Clicking on a performance issue in the performance assistant pane will populate 3 tabs: description, affected processes, and source locations to provide more information regarding the issue.

Additional Information

  1.  Intel Trace Analyzer and Collector documentation: