Python Profiling

line_profiler

The line_profiler tool allows one to see how much time is spent on each line within each function of your code. It is a popular tool and it should be your first choice for almost any Python project.

Installation

If you are installing into an existing Conda environment then use, for example:

$ conda activate torch-env
$ conda install line_profiler --channel conda-forge

One can also use pip to install the package where its name is line-profiler. For general instructions on installing packages see our Python page.

Profiling Procedure

Consider the code below (myscript.py) which computes a measure of the maximum distance between all pairs of elements in a 1-dimensional array using two different functions:

import numpy as np
from scipy.spatial.distance import pdist

def naive_maximum_distance(x):
  dist_max = 0
  for i in range(x.size - 1):
    for j in range(i + 1, x.size):
      dist = abs(x[i] - x[j])
      if (dist > dist_max): dist_max = dist
  return dist_max

def scipy_maximum_distance(x):
  X = x.reshape(-1, 1)
  d = pdist(X, metric='minkowski', p=1)
  dist_max = d.max()
  return dist_max

def main():
  N = 3000
  x = np.random.randn(N)
  dn = naive_maximum_distance(x)
  ds = scipy_maximum_distance(x)

if __name__ == '__main__':
  main()

There are three steps to profiling a Python script with line_profiler: (1) insert @profile decorators above each function to be profiled, (2) run the script under kernprof and (3) view the results by running Python under the line_profiler module on the output file from step 2. Here is an example:

Step 1: Add @profile decorators to the functions to be profiled

import numpy as np
from scipy.spatial.distance import pdist

@profile
def naive_maximum_distance(x):
  dist_max = 0
  for i in range(x.size - 1):
    for j in range(i + 1, x.size):
      dist = abs(x[i] - x[j])
      if (dist > dist_max): dist_max = dist
  return dist_max

@profile
def scipy_maximum_distance(x):
  X = x.reshape(-1, 1)
  d = pdist(X, metric='minkowski', p=1)
  dist_max = d.max()
  return dist_max

@profile
def main():
  N = 3000
  x = np.random.randn(N)
  dn = naive_maximum_distance(x)
  ds = scipy_maximum_distance(x)

if __name__ == '__main__':
  main()

 

Step 2: Run the code under kernprof instead of python

Submit your job to the cluster but replace python with kernprof -l. Note that typically a short run time can be used since you only need to run for long enough to properly sample the functions you wish to profile.

#!/bin/bash
#SBATCH --job-name=profiling     # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=4G                 # total memory per node
#SBATCH --time=00:05:00          # total run time limit (HH:MM:SS)

module purge
module load anaconda3/2023.9
conda activate torch-env

kernprof -l myscript.py

This will generate a file called myscript.py.lprof.

 

Step 3: Run python under the line_profiler module on the .lprof file

When the job finishes, to see the profiling results, run the following command on the output file generated in step 2:

$ python -m line_profiler myscript.py.lprof
Timer unit: 1e-06 s

Total time: 6.74651 s
File: myscript.py
Function: naive_maximum_distance at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           def naive_maximum_distance(x):
     6         1          0.0      0.0      0.0    dist_max = 0
     7      3000       1248.0      0.4      0.0    for i in range(x.size - 1):
     8   4501499    1535522.0      0.3     22.8      for j in range(i + 1, x.size):
     9   4498500    3600558.0      0.8     53.4        dist = abs(x[i] - x[j])
    10   4498500    1609178.0      0.4     23.9        if (dist > dist_max): dist_max = dist
    11         1          1.0      1.0      0.0    return dist_max

Total time: 0.084347 s
File: myscript.py
Function: scipy_maximum_distance at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    13                                           @profile
    14                                           def scipy_maximum_distance(x):
    15         1         21.0     21.0      0.0    X = x.reshape(-1, 1)
    16         1      80094.0  80094.0     95.0    d = pdist(X, metric='minkowski', p=1)
    17         1       4230.0   4230.0      5.0    dist_max = d.max()
    18         1          2.0      2.0      0.0    return dist_max

Total time: 10.4194 s
File: myscript.py
Function: main at line 20

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           @profile
    21                                           def main():
    22         1          3.0      3.0      0.0    N = 3000
    23         1        181.0    181.0      0.0    x = np.random.randn(N)
    24         1   10334420.0 10334420.0   99.2    dn = naive_maximum_distance(x)
    25         1      84819.0  84819.0      0.8    ds = scipy_maximum_distance(x)

We see the naive implementation of the fuction took 6.7 s with 53.4% of the time spent on the line computing the absolute value of the difference. The SciPy implementation was much faster with the most expensive line being the pairwise distance calculation where 95% of the time was spent.

 

cProfile

The built-in cProfile module provides a simple way to profile your code:

$ python -m cProfile -s tottime myscript.py > profile.txt
$ head -n 20 profile.txt 
         4637512 function calls (4634022 primitive calls) in 2.598 seconds

 Ordered by: internal time

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      1    2.066    2.066    2.329    2.329 myscript.py:4(naive_maximum_distance)
4498600    0.262    0.000    0.262    0.000 {built-in method builtins.abs}
      1    0.070    0.070    0.070    0.070 scipy.spatial._distance_wrap.pdist_minkowski_double_wrap
  67/65    0.021    0.000    0.022    0.000 {built-in method _imp.create_dynamic}
    215    0.020    0.000    0.020    0.000 {built-in method io.open_code}
   1101    0.018    0.000    0.018    0.000 {built-in method posix.stat}
    211    0.016    0.000    0.016    0.000 {built-in method marshal.loads}
    324    0.014    0.000    0.014    0.000 {built-in method builtins.compile}
      1    0.007    0.007    0.007    0.007 {built-in method mkl._py_mkl_service.get_version}
368/364    0.005    0.000    0.010    0.000 {built-in method builtins.__build_class__}
  67/51    0.005    0.000    0.016    0.000 {built-in method _imp.exec_dynamic}
    215    0.003    0.000    0.003    0.000 {method 'read' of '_io.BufferedReader' objects}
    472    0.003    0.000    0.027    0.000 :1431(find_spec)
      1    0.003    0.003    0.003    0.003 {method 'reduce' of 'numpy.ufunc' objects}
   9146    0.002    0.000    0.003    0.000 {built-in method builtins.getattr}

However, for most scripts cProfile provides information that is too fine grained.

PyCharm can be used for profiling. By default it uses cProfile. If you are working with multithreaded code then you should install and use yappi.

 

Misc

Within Jupyter notebooks one may use %time and %timeit for doing measurements.

Arm MAP may be used to profile some Python scripts that call compiled code. See this page for specific instructions.