line_profiler
The line_profiler tool allows one to see how much time is spent on each line within each function of your code. It is a popular tool and it should be your first choice for almost any Python project.
Installation
If you are installing into an existing Conda environment then use, for example:
$ conda activate torch-env $ conda install line_profiler --channel conda-forge
One can also use pip to install the package where its name is line-profiler. For general instructions on installing packages see our Python page.
Profiling Procedure
Consider the code below (myscript.py) which computes a measure of the maximum distance between all pairs of elements in a 1-dimensional array using two different functions:
import numpy as np from scipy.spatial.distance import pdist def naive_maximum_distance(x): dist_max = 0 for i in range(x.size - 1): for j in range(i + 1, x.size): dist = abs(x[i] - x[j]) if (dist > dist_max): dist_max = dist return dist_max def scipy_maximum_distance(x): X = x.reshape(-1, 1) d = pdist(X, metric='minkowski', p=1) dist_max = d.max() return dist_max def main(): N = 3000 x = np.random.randn(N) dn = naive_maximum_distance(x) ds = scipy_maximum_distance(x) if __name__ == '__main__': main()
There are three steps to profiling a Python script with line_profiler: (1) insert @profile decorators above each function to be profiled, (2) run the script under kernprof and (3) view the results by running Python under the line_profiler module on the output file from step 2. Here is an example:
Step 1: Add @profile decorators to the functions to be profiled
import numpy as np from scipy.spatial.distance import pdist @profile def naive_maximum_distance(x): dist_max = 0 for i in range(x.size - 1): for j in range(i + 1, x.size): dist = abs(x[i] - x[j]) if (dist > dist_max): dist_max = dist return dist_max @profile def scipy_maximum_distance(x): X = x.reshape(-1, 1) d = pdist(X, metric='minkowski', p=1) dist_max = d.max() return dist_max @profile def main(): N = 3000 x = np.random.randn(N) dn = naive_maximum_distance(x) ds = scipy_maximum_distance(x) if __name__ == '__main__': main()
Step 2: Run the code under kernprof instead of python
Submit your job to the cluster but replace python with kernprof -l. Note that typically a short run time can be used since you only need to run for long enough to properly sample the functions you wish to profile.
#!/bin/bash #SBATCH --job-name=profiling # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem=4G # total memory per node #SBATCH --time=00:05:00 # total run time limit (HH:MM:SS) module purge module load anaconda3/2020.11 conda activate torch-env kernprof -l myscript.py
This will generate a file called myscript.py.lprof.
Step 3: Run python under the line_profiler module on the .lprof file
When the job finishes, to see the profiling results, run the following command on the output file generated in step 2:
$ python -m line_profiler myscript.py.lprof Timer unit: 1e-06 s Total time: 6.74651 s File: myscript.py Function: naive_maximum_distance at line 4 Line # Hits Time Per Hit % Time Line Contents ============================================================== 4 @profile 5 def naive_maximum_distance(x): 6 1 0.0 0.0 0.0 dist_max = 0 7 3000 1248.0 0.4 0.0 for i in range(x.size - 1): 8 4501499 1535522.0 0.3 22.8 for j in range(i + 1, x.size): 9 4498500 3600558.0 0.8 53.4 dist = abs(x[i] - x[j]) 10 4498500 1609178.0 0.4 23.9 if (dist > dist_max): dist_max = dist 11 1 1.0 1.0 0.0 return dist_max Total time: 0.084347 s File: myscript.py Function: scipy_maximum_distance at line 13 Line # Hits Time Per Hit % Time Line Contents ============================================================== 13 @profile 14 def scipy_maximum_distance(x): 15 1 21.0 21.0 0.0 X = x.reshape(-1, 1) 16 1 80094.0 80094.0 95.0 d = pdist(X, metric='minkowski', p=1) 17 1 4230.0 4230.0 5.0 dist_max = d.max() 18 1 2.0 2.0 0.0 return dist_max Total time: 10.4194 s File: myscript.py Function: main at line 20 Line # Hits Time Per Hit % Time Line Contents ============================================================== 20 @profile 21 def main(): 22 1 3.0 3.0 0.0 N = 3000 23 1 181.0 181.0 0.0 x = np.random.randn(N) 24 1 10334420.0 10334420.0 99.2 dn = naive_maximum_distance(x) 25 1 84819.0 84819.0 0.8 ds = scipy_maximum_distance(x)
We see the naive implementation of the fuction took 6.7 s with 53.4% of the time spent on the line computing the absolute value of the difference. The SciPy implementation was much faster with the most expensive line being the pairwise distance calculation where 95% of the time was spent.
cProfile
The built-in cProfile module provides a simple way to profile your code:
$ python -m cProfile -s tottime myscript.py > profile.txt $ head -n 20 profile.txt 4637512 function calls (4634022 primitive calls) in 2.598 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 2.066 2.066 2.329 2.329 myscript.py:4(naive_maximum_distance) 4498600 0.262 0.000 0.262 0.000 {built-in method builtins.abs} 1 0.070 0.070 0.070 0.070 scipy.spatial._distance_wrap.pdist_minkowski_double_wrap 67/65 0.021 0.000 0.022 0.000 {built-in method _imp.create_dynamic} 215 0.020 0.000 0.020 0.000 {built-in method io.open_code} 1101 0.018 0.000 0.018 0.000 {built-in method posix.stat} 211 0.016 0.000 0.016 0.000 {built-in method marshal.loads} 324 0.014 0.000 0.014 0.000 {built-in method builtins.compile} 1 0.007 0.007 0.007 0.007 {built-in method mkl._py_mkl_service.get_version} 368/364 0.005 0.000 0.010 0.000 {built-in method builtins.__build_class__} 67/51 0.005 0.000 0.016 0.000 {built-in method _imp.exec_dynamic} 215 0.003 0.000 0.003 0.000 {method 'read' of '_io.BufferedReader' objects} 472 0.003 0.000 0.027 0.000 :1431(find_spec) 1 0.003 0.003 0.003 0.003 {method 'reduce' of 'numpy.ufunc' objects} 9146 0.002 0.000 0.003 0.000 {built-in method builtins.getattr}
However, for most scripts cProfile provides information that is too fine grained.
PyCharm can be used for profiling. By default it uses cProfile. If you are working with multithreaded code then you should install and use yappi.
Misc
Within Jupyter notebooks one may use %time and %timeit for doing measurements.
Arm MAP may be used to profile some Python scripts that call compiled code. See this page for specific instructions.