Software

A large number of scientific applications and software tools are installed on the HPC clusters. Users can install custom software into their /home directories. Software containers can be used via Singularity.

Environment Modules

Some software is made available via environment modules. Read through the environment modules page and then return here. After reading about environment modules you should be able to:

  • use "module avail" to list available modules
  • load and unload modules
  • use "module show" to see how loading a module changes your environment

A Word on Python Modules

To get an Anaconda Python implementation (the recommended way to use Python), simply load one of the available anaconda3 modules. For example:

$ module load anaconda3/2020.11
$ python --version
Python 3.8.5

$ which python
/usr/licensed/anaconda3/2020.11/bin/python

To see all the packages that are included in Anaconda Python run this command:

$ conda list

For more on Anaconda Python, conda and conda environments, see Python on the HPC Clusters.

A Word on R Modules

The available R modules can be seen by running:

$ module avail R

To start the latest version of R, simply type:

$ R

Note that RStudio is available through the MyAdroit and MyDella web portals. For a comprehensive guide on R and RStudio see R on the HPC Clusters.

Final Tips on Modules

Remember that no modules are loaded upon connecting to a cluster. Don't put module commands in your .bashrc file. Best practice is to load them each time. By all means set up an alias for your use, but .bashrc is not implicitly loaded for a SLURM job. You're likely to set up a situation where you have tangled modules and not quite sure why your job is failing to behave as expected.

If you need software that is not installed or made available through modules, you will most likely have to install the software yourself. The following section provides the needed guidelines.

 

Installing Software Not Available on the Clusters

In general, to install software not available as a module, we recommend that you create a directory such as /home/<YourNetID>/software to build and store software. (As a reminder, your home directory is backed-up.)

One exception to this general recommendation is when installing Python and R packages. Python and R packages are installed by default in your home directory, and therefore don't require that you set up a special folder for them. See more about installing Python or R packages below.

Two notes:

  • Be sure to run the checkquota command regularly to make sure you have enough space. Errors found when installing packages can often come down to this.
  • Commands like sudo yum install or sudo apt-get install will not work.

Installing Python Packages on the HPC Clusters

See this guide to installing Python packages with conda or pip on Princeton Research Computing's Python resource page.

Installing R Packages on the HPC Cluster

See this guide to installing R packages on Princeton Research Computing's R resource page.

It's important to be aware of the need to update the compiler before installing certain R packages. This is mentioned in the Compiling Software, GNU Compiler Collection (GCC) section, and is described in more detail in the linked guide to installing R packages.

Using Software Containers

Software containers can be really useful when you need software that may have tricky dependencies. You can pull and run an image (essentially a large file) that contains the software and everything it needs.

We do not allow Docker but Singularity can be used. You can still search for and use images from Docker, you just need to use Singularity commands. For example:

$ singularity pull docker://hello-world
$ singularity run hello-world_latest.sif
...
Hello from Docker!
This message shows that your installation appears to be working correctly.
...

For more information see Containers on the HPC Clusters.

Compiling Software, GNU Compiler Collection (GCC)

Software that comes in source form must be compiled before it can be installed in your /home directory.

One popular tool suite for doing this is the GNU Compiler Collection (GCC) which is composed of compilers, a linker, libraries, and tools.

To provide a stable environment for building software on our HPC clusters, the default version of GCC is kept the same for years at a time. To see the current version of the GNU C++ compiler, namely g++, run the following command on one of the HPC clusters (e.g., Della):

$ g++ --version
g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)

While most R packages will compile with the current long-term but older version of GCC, some require a newer version. Della and Tiger both have these older versions of GCC, and a newer version is made available by loading one of the latest Red Hat Developer Toolset (rh/devtoolset) modules:

$ module load rh/devtoolset/8
$ g++ --version
g++ (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)

Common errors when the rh module is not loaded include:

  • g++: error: unrecognized command line option -std=c++17
  • gcc: error: unrecognized command line option '-std=c++14'
  • 'for' loop initial declarations are only allowed in C99 mode.

Note that Adroit has a newer version of GCC, and the rh/devtoolset module is therefore not needed on this cluster.

When compiling a parallel code that uses the message-passing interface (MPI), you will need to load an MPI module. You can load the Intel compilers and Intel MPI library with:

$ module load intel/19.1.1.217 intel-mpi/intel/2019.7 
Loading intel/19.1.1.217
  Loading requirement: intel-mkl/2020.1

Loading intel-mpi/intel/2019.7
  Loading requirement: ucx/1.9.0
$ mpicc --version
icc (ICC) 19.1.1.217 20200306

Note that the C and Fortran compilers and related tools are also updated by this method which is important for some software. The relevant tools are gccg++gfortranmakeldarasgdbgprofgcov and more.

Vectorization

Modern CPUs can perform more than one operation per cycle using vector execution units. A common example is elementwise vector addition.

Vectorized code generated for one processor will not run on another processor unless it supports those instructions. Such an attempt will produce an illegal instruction error if the instructions are not supported.

Della

Della is composed of three different Intel Xeon microarchitectures:

  • Broadwell (AVX2)
  • Skylake (AVX-512)
  • Cascade Lake (AVX-512)

The head node della5 is Broadwell. If you compile a code on the head node it will not take advantage of the AVX-512 instructions available on the Skylake and Cascade Lake nodes unless you add the appropriate flags (i.e., -xCORE-AVX2 -axCORE-AVX512).

TigerCPU vs. TigerGPU

The processor on tigercpu supports AVX512 instructions while those on tigergpu can only do AVX2.

Be sure to compile codes for tigercpu by ssh-ing to tigercpu.princeton.edu and compile codes for tigergpu by ssh-ing to tigergpu.princeton.edu.

If you ssh to tiger.princeton.edu then you will land on tigercpu.princeton.edu.