The most popular scripting language in the computational sciences

This guide presents an overview of installing Python packages and running Python scripts on the HPC clusters. Angular brackets < > denote command line options that you should replace with a value specific to your work. Commands preceded by the $ character are to be run on the command line.

Quick Start

Try the following procedure to install your package(s):

$ module load anaconda3/2024.2
$ conda create --name myenv <package-1> <package-2> ... <package-N> [--channel <name>]
$ conda activate myenv

Here is a specific example:

$ module load anaconda3/2024.2
$ conda create --name ml-env scikit-learn pandas matplotlib --channel conda-forge
$ conda activate ml-env

Each package and its dependencies will be installed locally in ~/.conda. Consider replacing myenv with an environment name that is specific to your work. On the command line, use conda deactivate to leave the active environment and return to the base environment.

Below is a sample Slurm script (job.slurm):

#!/bin/bash
#SBATCH --job-name=py-job        # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu
module purge
module load anaconda3/2024.2
conda activate myenv
python myscript.py

If the installation was successful then your job can be submitted to the cluster with:

$ sbatch job.slurm

On Tiger, if for some reason you are trying to install a Python 2 package then use module load anaconda/<version> instead of anaconda3/<version> in the directions above. There are no anaconda modules for Python 2 on the other clusters. Python 2 has been unsupported since January 1, 2020.

See step-by-step directions for uploading files and running a Python script. Watch a PICSciE workshop video about Conda environments and Python.

Introduction

When you first log in to one of the clusters, the system Python is available but this is almost always not what you want. To learn about the system Python, run these commands:

$ python2 --version
Python 2.7.18
$ which python2
/usr/bin/python2
$ python3 --version
Python 3.6.8
$ which python3
/usr/bin/python3

We see that both python2 and python3 are installed in a system directory.

On the Princeton HPC clusters we offer the Anaconda Python distribution as replacement to the system Python. In addition to Python's vast built-in library, Anaconda provides hundreds of additional packages which are ideal for scientific computing. In fact, many of these packages are optimized for our hardware. To make Anaconda Python available, run the following command:

$ module load anaconda3/2024.2

Let's inspect our newly loaded Python by using the same commands as above:

(base) $ python --version
Python 3.11.7
(base) $ which python
/usr/licensed/anaconda3/2024.2/bin/python
(base) $ python3 --version
Python 3.11.7
(base) $ which python3
/usr/licensed/anaconda3/2024.2/bin/python3

We now have an updated version of Python and related tools. In fact, the new python and python3 commands are identical as they are in fact symbolic links to python3.10. The command prompt is preceeded by "(base)" which indicates that conda is operating on the base environment. One cannot make any changes to base environment.

To see all the pre-installed Anaconda packages and their versions use the conda list command:

(base) $ conda list
# packages in environment at /usr/licensed/anaconda3/2024.2:
#
# Name                    Version                   Build  Channel
_anaconda_depends         2024.02             py311_mkl_1  
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
abseil-cpp                20211102.0           hd4dd3e8_0  
aiobotocore               2.7.0           py311h06a4308_0  
aiohttp                   3.9.3           py311h5eee18b_0  
aioitertools              0.7.1              pyhd3eb1b0_0  
aiosignal                 1.2.0              pyhd3eb1b0_0  
alabaster                 0.7.12             pyhd3eb1b0_0  
altair                    5.0.1           py311h06a4308_0  
anaconda-anon-usage       0.4.3           py311hfc0e8ea_100  
anaconda-catalogs         0.2.0           py311h06a4308_0  
anaconda-client           1.12.3          py311h06a4308_0  
anaconda-cloud-auth       0.1.4           py311h06a4308_0  
anaconda-navigator        2.5.2           py311h06a4308_0  
anaconda-project          0.11.1          py311h06a4308_0  
anyio                     4.2.0           py311h06a4308_0  
aom                       3.6.0                h6a678d5_0  
appdirs                   1.4.4              pyhd3eb1b0_0
…

There are hundreds of packages pre-installed and ready to be used with a simple import statement. If the packages you need are on the list or are found in the Python standard library then you can begin your work. Otherwise, keep reading to learn how to install packages into custom environments (not the base environment).

The Anaconda Python distribution is system software. This means that you can use any of its packages but you cannot make any modifications to them (such as an upgrade) and you cannot install new ones in their location. You can, however, install whatever packages you want in your home directory in custom environments. The two most popular package managers for installing Python packages are conda and pip.

checkquota

Python packages can require many gigabytes of storage. By default they are installed in your /home directory which is typically around 10-50 GB. Be sure to run the checkquota command before installing to make sure that you have space.

Package and Environment Managers

conda

Unlike pip, conda is both a package manager and an environment manager. It is also language-agnostic which means that in addition to Python packages, it is also used for R and Fortran, for example. Conda looks to the main channel of Anaconda Cloud to handle installation requests but there are numerous other channels that can be searched such as bioconda, intel, r and conda-forge. Conda always installs pre-built binary files. The software it provides often has performance advantages over other managers due to leveraging Intel MKL, for instance. Below is a typical session where an environment is created and one or more packages are installed in to it:

$ module load anaconda3/2024.2
$ conda create --name myenv <package-1> <package-2> ... <package-N>
$ conda activate myenv

Note that you should specify all the packages that you need in one line so that the dependencies can be satisfied simultaneously. Installing packages into the environment at a later time is possible. To exit a conda environment, run this command: conda deactivate. If you try to install using conda install <package> it will fail with: EnvironmentNotWritableError: The current user does not have write permissions to the target environment. The solution is to create an environment and do the install in the same command (as shown above).

Common conda commands

View the help menu:

$ conda -h

To view the help menu for the install command:

$ conda install --help

Search the "conda-forge" channel for the fenics package:

$ conda search fenics --channel conda-forge

Create the "myenv" environment and install pairtools into the environment:

$ conda create --name myenv pairtools

Create an environment called myenv and install Python version 3.7 and beaver:

$ conda create --name myenv python=3.7 beaver

Create an environment called biowork-env and install blast from the bioconda channel:

$ conda create --name biowork-env blast --channel bioconda

Install bazel on Traverse:

$ conda create --name baz-env bazel --channel powerai

List the installed packages for the present environment (consider adding --explicit):

(myenv) $ conda list

Install the "pandas" package into an environment that was previously created:

$ conda activate biowork-env
(biowork-env)$ conda install pandas

List the available environments:

$ conda env list

Remove the "bigdata-env" environment:

$ conda remove --name bigdata-env --all

Much more can be done with conda as a package manager or environment manager.

To see examples of installation scripts for various commonly used packages–such as TensorFlow, mpi4py, PyTorch, JAX, and others–see Common Package Installation Examples section below.

pip

pip stands for "pip installs packages". It is a package manager for Python packages only. pip installs packages that are hosted on the Python Package Index or PyPI.

You will typically want to use pip within a Conda environment after installing packages via conda to get packages that are not available on Anaconda Cloud. For example:

$ module load anaconda3/2024.2
$ conda create --name sklearn-env scikit-learn pandas matplotlib
$ conda activate sklearn-env
(sklearn-env)$ pip install multiregex

You should avoid installing conda packages after doing pip installs within a Conda environment.

Do not use the pip3 command even if the directions you are following tell you to do so (use pip instead). pip will search for a pre-compiled version of the package you want called a wheel. If it fails to finds this for your platform then it will attempt to build the package from source. It can take pip several minutes to build a large package from source. One often needs to load various environment modules in addition to anaconda3 before doing a pip install. For instance, if your package uses GPUs then you will probably need to do module load cudatoolkit/<version> or if it uses the message-passing interface (MPI) for parallelization then module load openmpi/<version>. To see all available software modules, run module avail.

Common pip commands

View the help menu:

$ pip -h

The help menu for the install command:

$ pip install --help

Search the Python Package Index PyPI for a given package (e.g., jax):

$ pip search jax

List all installed packages:

$ pip list

Install pairtools and pyblast for version 3.8 of Python

$ pip install python==3.8 pairtools pyblast

Install a set of packages listed in a text file

$ pip install -r requirements.txt

To see detailed information about an installed package such as sphinx:

$ pip show sphinx

Upgrade the sphinx package:

$ pip install --upgrade sphinx

Uninstall the pairtools package:

$ pip uninstall pairtools

See the pip documentation for more.

To see examples of installation scripts for various commonly used packages (e.g., TensorFlow, mpi4py, PyTorch) see Common Package Installation Examples section below.

Isolated Python Environments with virtualenv

Often times you will want to create isolated Python environments. This is useful, for instance, when you have two packages that require different versions of a third package. The use of environments saves one the trouble of repeatedly upgrading or downgrading the third package in this case. While we recommend using conda as explained above, one can also use virtualenv to create isolated Python environments. To get started with virtualenv it must first be installed:

$ module load anaconda3/2024.2
$ pip install --user virtualenv

Defaulting to user installation because normal site-packages is not writeable

Note that like pip, virtualenv is an executable, not a library. To create an isolated environment do:

$ mkdir myenv
$ virtualenv myenv
$ source myenv/bin/activate

Consider replacing myenv with a more suitable name for your work. Now you can install Python packages in isolation from other Python environments:

$ pip install slingshot bell
$ deactivate

Note the --user option is omitted since the packages will be installed locally in the virtual environment. At the command line, to leave the environment run deactivate.

Make sure you source the environment in your Slurm script as in this example:

#!/bin/bash
#SBATCH --job-name=py-job        # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email when job begins, ends and fails
#SBATCH --mail-user=<YourNetID>@princeton.edu
module purge
module load anaconda3/2024.2
source </path/to>/myenv/bin/activate
python myscript.py

As an alternative to virtualenv, you may consider using the built-in Python 3 module venv. pip in combination with virtualenv serve as useful package and environment managers but, in general, we recommand conda. There are also combined managers such as pipenv and pyenv that you may consider.

pip vs. conda

If your package exists on PyPI and Anaconda Cloud then how do you decide which to install from? You should almost always favor conda over pip. This is because conda packages are pre-compiled and their dependencies are automatically handled. While pip installs will often download a binary wheel (pre-compiled), the user frequently needs to take action to satisfy the dependencies. Furthermore, many scientific conda packages are linked against the Intel Math Kernel Library which leads to improved performance over pip installs on our systems. One disadvantage of conda packages is that they tend to lag behind pip packages in terms of versioning. In many cases, the decision of conda versus pip will be answered by reading the installation instructions for the software you would like to use. Write to [email protected] for a recommendation on the installation procedure or if you encounter problems while trying to run your Python script.

Install Packages Using pip into Conda Environments

Some Python software is only available on the Python Package Index (PyPI). When installing such software using pip there is a correct (or strongly recommended) and incorrect way of doing it. We first illustrate the incorrect way:

$ module load anaconda3/2024.2
(base) $ pip install cosmopower  # INCORRECT
Defaulting to user installation because normal site-packages is not writeable

The problem with the approach above is that the Python software will be installed in /home/<YourNetID>/.local/lib/python3.XX where it can create conflicts with other software including Jupyter OnDemand.

The correct way to install such software is to create a conda environment containing "python", activate the environment, and then use pip to install packages into the environment:

$ module load anaconda3/2024.2
(base) $ conda create --name cosmo-env python -y
(base) $ conda activate cosmo-env
(cosmo-env) $ pip install cosmopower  # CORRECT

This approach installs "cosmopower" and its dependencies into an isolated software environment called "cosmo-env". If you need a specific version of "python" then explicitly specify the version such as "python=3.9" in the "conda create" line.

Take the time now to see if you have any incorrect pip installations by running these commands:

$ cd ~/.local/lib
$ ls -l
drwxr-xr-x. 3 aturing math 35 Mar 14 09:40 python3.10

If the output of the commands above includes directories such as python3.XX where XX is 7, 8, 9, 10 and so on then you should disable these directories by running a command such as the following for each directory:

$ mv python3.10 python3.10-disabled

If you are sure that you do not need the software in those directories (e.g., python3.10/site-packages/) then you can remove them instead of renaming them with a command such as:

$ rm -rf python3.10

After disabling or removing a python3.XX directory, it may be necessary to install that software into the appropriate conda environment if your workflow was in fact using it.

mamba

Mamba is a drop-in replacement for Conda and is almost always faster at dependency resolution. As of anaconda3/2024.2, mamba is now your default solver! If you need to use mamba for an older version of anaconda, read on.

Using the anaconda3/2023.9 module, you can set mamba to be the default solver by running the following command:

$ conda config --set solver libmamba

(Note that this will change the solver for all of your environments)

For anaconda3/2023.9 and later, the commands below illustrate how to install mamba and then use it to create a Conda environment from an environment file:

$ module load anaconda3/2023.9
$ conda create --name mamba-env mamba -c conda-forge
$ conda activate mamba-env
$ (mamba-env) mamba env create -f environment.yml

This should not be done for earlier versions of the the anaconda3 module. Instead of installing from an environment module, could could also simply list the packages:

$ (mamba-env) mamba install pandas scikit-learn matplotlib

Python 2

If you need to make a Python 2 environment then simply specify the Python version when making the environment:

$ module load anaconda3/2024.2
$ conda create --name py27 python=2.7 ipykernel numpy matplotlib
$ conda activate py27

Python 2 has been unsupported since January 1, 2020.

Installing Python Packages from Source

In some cases you will be provided with the source code for your package. To install from source do:

$ python setup.py install --prefix=</path/to/install/location>

For help menu use python setup.py --help-commands. Be sure to update the appropriate environment variables in your ~/.bashrc file:

export PATH=</path/to/install/location>/bin:$PATH
export PYTHONPATH=</path/to/install/location>/lib/python<version>/site-packages:$PYTHONPATH

ModuleNotFoundError

Python users can encounter the situation where a module is not found. For example:

$ python
>>> import bigfoot
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'bigfoot'

To see where Python is looking for modules, run the following commands:

$ python
>>> import sys
>>> print(sys.path)
['',
 '/home/aturing/.conda/envs/myenv/lib/python39.zip',
 '/home/aturing/.conda/envs/myenv/lib/python3.9',
 '/home/aturing/.conda/envs/myenv/lib/python3.9/lib-dynload',
 '/home/aturing/.conda/envs/myenv/lib/python3.9/site-packages']

The search paths are stored in sys.path. If your module is not found in those paths then you will encounter a ModuleNotFoundError. The first entry in the list above (i.e., ' ') corresponds to the current directory. The most relevant path usually ends with "site-packages".

Note that sys.path is a Python list that can be modified to include an additional path:

>>> sys.path.append("/home/aturing/mymodules")

One can also remove paths from sys.path if conflicting modules are being found in different directories.

The PYTHONPATH environment variable can also be used to modify the Python search paths.

Packaging and Distributing Your Own Python Package

Both PyPI and Anaconda allow registered users to store their packages on their platforms. You must follow the instructions for doing so but once done someone can do a pip install or a conda install of your package. This makes it very easy to enable someone else to use your research software. See this guide for practical examples of the process.

Jupyter Notebooks on the HPC Clusters

Please see our page for Jupyter on the HPC Clusters.

OnDemand Jupyter

Multiprocessing

The multiprocessing module enables single-node parallelism for Python scripts based on the subprocess module. The script below uses multiprocessing to execute an embarrassingly parallel mapping of a short list:

import os
from multiprocessing import Pool
def f(x):
  return x*x
if __name__ == '__main__':
  num_cores = int(os.getenv('SLURM_CPUS_PER_TASK'))
  with Pool(num_cores) as p:
    print(p.map(f, [1, 2, 3, 4, 5, 6, 7, 8]))

The scipt above can also be used to parallelize a for loop. Below is an appropriate Slurm script for this code:

#!/bin/bash
#SBATCH --job-name=multipro      # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=4        # number of processes
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu
module purge
module load anaconda3/2024.2
srun python myscript.py

The output of the Python script is:

[1, 4, 9, 16, 25, 36, 49, 64]

The Python script extracts the number of cores from the Slurm environment variable. This eliminates the potential problems that could arise if the two values were set independently in the Slurm script and Python script.

Often times the best way to carry out a large number independent Python jobs is using an job array and not by using the multiprocessing module.

Debugging Python

Learn more about debugging Python code on the Princeton HPC clusters.

This video explains how to run the PyCharm debugger on a TigerGPU node. The same procedure can be used for the other clusters. PyCharm for Linux is available on jetbrains.com. While the video uses the Community Edition, you can get the professional edition for free by supplying your "dot edu" email address.

While debugging you may benefit from using unbuffered output of print statements. This can be achieved by modifying the Slurm script as follows:

python -u myscript.py

If the above proves to be insufficient then try the following:

print("made it here", flush=True)

If the above is still insufficient then try writing a line to file:

with open("debug.log", "w") as fp:
    fp.write("made it here")

After debugging is complete you should return to buffered output to avoid potential performance costs associated with unbuffered output.

Profiling Python

The most highly recommended tool for profiling Python is line_profiler which makes it easy to see how much time is spent on each line within a function as well as the number of calls.

The built-in cProfile module provides a simple way to profile your code:

python -m cProfile -s tottime myscript.py

However, most users find that the cProfile module provides information that is too fine grained.

PyCharm can be used for profiling. By default it uses cProfile. If you are working with multithreaded code then you should install and use yappi.

Within Jupyter notebooks one may use %time and %timeit for doing measurements.

Arm MAP may be used to profile some Python scripts that call compiled code. See our MAP guide for specific instructions.

Building Python from Source

The procedure below shows how to build Python from source:

$ cd $HOME/software  # or another location
$ wget https://www.python.org/ftp/python/3.8.5/Python-3.8.5.tgz
$ tar zxf Python-3.8.5.tgz
$ cd Python-3.8.5
$ module load rh/devtoolset/8  # needed on tiger cluster only
$ ./configure --help
$ ./configure --enable-optimizations --prefix=$HOME/software/python385
$ make -j 10
$ make test  # some tests fail
$ make install
$ cd python385/bin
$ ./python3

Common Package Installation Examples

FEniCS

FEniCS is an open-source computing platform for solving partial differential equations. To install:

$ module load anaconda3/2024.2
$ conda create --name fenics-env -c conda-forge fenics
$ conda activate fenics-env

Make sure you include conda activate fenics-env in your Slurm script. For better performance one may consider installing from source.

CuPy on Traverse

CuPy is available via Anaconda Cloud on all our clusters. For Traverse use the IBM WML channel:

$ module load anaconda3/2024.2
$ CHNL="https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda"
$ conda create --name cupy-env --channel ${CHNL} cupy

Be sure to include module load anaconda3/2022.10 in your Slurm script.

JAX

JAX is  Autograd and XLA, brought together for high-performance machine learning research. See the Intro to ML Libraries repo for build directions.

PyStan

Here are the directions for installing PyStan:

$ module load anaconda3/2024.2
$ conda create --name stan-env pystan
$ conda activate stan-env

To compile models, your Slurm script will need to include the rh module, which provides a newer compiler suite:

#!/bin/bash
#SBATCH --job-name=myjob         # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=01:00:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email when job begins, ends and fails
#SBATCH --mail-user=<YourNetID>@princeton.edu
module purge
module load anaconda3/2024.2
conda activate stan-env
python myscript.py

Try varying the value of cpus-per-task to see if you get a speed-up. Note that the more resources you request, the longer the queue time.

Deeplabcut

See the deeplabcut website. Consider using the Docker image via Singularity:

$ singularity pull docker://deeplabcut/deeplabcut:2.2.1.1-jupyter-cuda11.0.3-runtime-ubuntu18.04

The Slurm script could appear as follows:

export SINGULARITYENV_LD_LIBRARY_PATH=/my-cusolver:/my-cudnn:${LD_LIBRARY_PATH}
export SINGULARITYENV_APPEND_PATH=/my-ptxas
singularity exec --nv -B /usr/local/cuda-11.7/lib64:/my-cusolver \
-B /usr/local/cudnn/cuda-11.3/8.2.0/lib64:/my-cudnn \
-B /usr/local/cuda-11.7/bin:/my-ptxas \
</path/to>/deeplabcut_2.2.1.1-jupyter-cuda11.0.3-runtime-ubuntu18.04.sif python3 myscript.py

The instructions below no longer work as of January 2023. They are kept here for reference. Please use the container directions above.

$ ssh -X <YourNetID>@della-gpu.princeton.edu
$ module load anaconda3/2024.2
$ conda create --name dlc-env python=3.8 wxPython=4.0.7 jupyter nb_conda ffmpeg cudnn=8 \
cudatoolkit=11 -c conda-forge -y
$ conda activate dlc-env
$ pip install deeplabcut[gui]

Make sure you understand "ssh -X" and the other options for using a GUI. If you fail to setup a software environment that can handle graphics then you will encounter: "ImportError: Cannot load backend 'WXAgg' which requires the 'wx' interactive framework, as 'headless' is currently running". Be sure to use salloc for interactive work since running on the login node is not allowed.

Lenstools

$ module load anaconda3/2024.2
$ conda create --name lenstools-env numpy scipy pandas matplotlib astropy
$ conda activate lenstools-env
$ module load rh/devtoolset/8 openmpi/gcc/3.1.5/64 gsl/2.4 
$ export MPICC=$(which mpicc)
$ pip install mpi4py
$ pip install emcee==2.2.1
$ pip install lenstools

Note that you will receive warnings when lenstools is imported in Python.

SMC++

SMC++ infers population history from whole-genome sequence data. There is a Docker image that can be used with Singularity:

$ singularity pull docker://terhorst/smcpp:latest
$ singularity exec smcpp_latest.sif smc++ --help

One can also perform the installation with Conda:

$ module load anaconda3/2024.2
$ conda create --name smcpp-env cython numpy
$ conda activate smcpp-env
$ pip install git+https://github.com/popgenmethods/smcpp
$ smc++ --help

Dedalus

Dedalus can be used to solve differential equations using spectral methods.

$ module load anaconda3/2024.2
$ conda create --name dedalus-env python=3.6
$ conda activate dedalus-env
$ conda config --add channels conda-forge
$ conda install nomkl cython docopt matplotlib pathlib scipy
$ module load openmpi/gcc/1.10.2/64 fftw/gcc/openmpi-1.10.2/3.3.4 hdf5/gcc/openmpi-1.10.2/1.10.0
$ export FFTW_PATH=$FFTW3DIR
$ export HDF5_DIR=$HDF5DIR
$ export MPI_PATH=/usr/local/openmpi/1.10.2/gcc/x86_64
$ export MPICC=$(which mpicc)
$ pip install mpi4py
$ CC=mpicc pip install --upgrade --no-binary :all: h5py
$ hg clone https://bitbucket.org/dedalus-project/dedalus
$ cd dedalus
$ pip install -r requirements.txt
$ python setup.py build
$ python setup.py install

TensorFlow

See our guide for TensorFlow on the HPC clusters.

PyTorch

See our guide for PyTorch on the HPC clusters.

mpi4py

MPI for Python (mpi4py) provides bindings of the Message Passing Interface (MPI) standard for the Python programming language. It can be used to parallelize Python scripts. See our guide for installing mpi4py on the HPC clusters.

Workshops and Learn Resources

There are various Princeton Research Computing workshops such as Introduction to Programming Using Python and Python for Poets. For more advanced material see Level Up Your PythonHigh Performance PythonHigh Performance Python for GPUs and Mixing Compiled Code and Python by Henry Schreiner.

See our recommended books, videos and websites for learning Python.

FAQ

1. Why does pip install <package> fail with an error mentioning a Read-only file system?

After loading the anaconda3 module, pip will be available as part of Anaconda Python which is a system package. By default pip will try to install the files in the same locations as the Anaconda packages. Because you don't have write access to this directory the install will fail. One needs to add --user as discussed above.

2. What should I do if I try to install a Python package and the install fails with: error: Disk quota exceeded?

You have three options. First, consider removing files within your home directory to make space available. Second, run the checkquota command and follow the link at the bottom to request more space. Lastly, for pip installations see the question toward the bottom of this FAQ for a third possibility i.e., setting --location to /scratch/gpfs/<YourNetID>. For conda installs try learning about the --prefix option.

3. Why do I get the following error message when I try to run pip on Della: -bash: pip: command not found?

You need to do module load anaconda3 before using pip or any of the Anaconda packages. You also need to load this module before using Python itself.

4. I read that it is a good idea to update conda before installing a package. Why do I get an error message when I try to perform the update?

conda is a system executable. You do not have permission to update it. If you try to update it you will get this error: EnvironmentNotWritableError: The current user does not have write permissions to the target environment. The current version is sufficient to install any package.

5. When I run conda list on the base environment I see the package that I need but it is not the right version. How can I get the right version? One solution is to create a conda environment and install the version you need there. The version of NumPy on Tiger is 1.16.2. If you need version 1.16.5 for your work then do: conda create --name myenv numpy=1.16.5.

6. Is it okay if I combine virtualenv and conda?

This is highly discouraged. While in principle it can work, most users find it just causes problems. Try to stay within one environment manager. Note that if you create a conda environment you can use pip to install packages.

7. Can I combine conda and pip?

Yes, and this tends to work well. A typical session may look like this:

$ module load anaconda3/2024.2
$ conda create --name myenv python=3.8
$ conda activate myenv
$ pip install scitools

Note that --user is omitted when using pip within a conda environment. See the bullet points at the bottom of this page for tips on using this approach.

8. How do I install a Python package in a custom location using pip or conda?

For pip, first do pip install --target=</path/to/install/location> <package> then update the PYTHONPATH environment variable in your ~/.bashrc file with export PYTHONPATH=$PYTHONPATH:/path/to/install/location. For conda, you use the --prefix option. For instance, to install cupy on /scratch/gpfs/<YourNetID>:

$ module load anaconda3/2024.2
$ conda create --prefix /scratch/gpfs/$USER/py-gpu cupy

Be sure to have these two lines in your Slurm script: module load anaconda3/2024.2 and conda activate /scratch/gpfs/$USER/py-gpu. Note that 1) if you are on Adroit, use /scratch/network in place of /scratch/gpfs, and 2) remember /scratch/gpfs and /scratch/network are not backed up. 

9. I tried to install some packages but now none of my Python tools are working. Is it possible to delete all my Python packages and start over?

Yes. Packages installed by pip are in ~/.local/lib while conda packages and environments are in ~/.conda. If you made any environments with virtualenv you should remove those as well. Removing these directories will give you a clean start. Be sure to examine the contents first. It may be wise to selectively remove sub-directories instead. You may also need remove the ~/.cache directory and you may need to make modifications to your .bashrc file if you added or changed environment variables.

10. How are my pip packages built? Which optimization flags are used? Do I have to be careful with vectorization on Della where several different CPUs are in play?

After loading the anaconda3 module, run this command: python3.7-config --cflags. To force a package to be built from source with certain optimization flags do, for example: CFLAGS="-O1" pip install numpy -vvv --no-binary=numpy

11. What is the Intel Python distribution and how do I get started with it? Intel provides their own implementation of Python as well as numerous packages optimized for Intel hardware. You may find significant performance benefits from these packages. To create a conda environment with Intel Python and a number of Intel-optimized numerics packages:

$ module load anaconda3/2024.2
$ conda create --name intelpy-env python scikit-learn scipy --channel intel

12. The installation directions that I am following say to use pip3. Is this okay?

Do not use pip3 for installing Python packages. pip3 is a component of the system Python and it will not work properly with Anaconda. Always do module load anaconda3/<version> and then use pip for installing packages.

13. How do I resolve this error?

conda: error: argument COMMAND: invalid choice: 'activate'

Try creating a new shell. This will reset your software environment. If you were working on a compute node then you will need to create a new interactive session in the new shell.

Getting Help

If you encounter any difficulties while using Python on one of our HPC clusters then please send an email to [email protected] or attend a help session.