Introducing SLURM

Complete HPC Guide

See a comprehensive guide on SLURM and the Princeton HPC systems: Getting Started with the HPC Clusters

 

SLURM scheduler

On all of the cluster systems, you run programs by storing the necessary commands in a script file and requesting that the job scheduling program SLURM execute the script file.

A SLURM script file begins with a line identifying the Unix shell to be used by the script. This is usually #!/bin/bash.  Next come directives to SLURM beginning with #SBATCH.  Every SLURM script should include the  - -nodes, - -ntasks-per-node, and - -time directives.  The - -nodes directive tells SLURM how many nodes to assign to this job.  The - -ntasks-per-node directive tells SLURM how many simultaneous processes will run on each node.  The - -time directive tells SLURM how long the job will run.

In the example below, the job asks for one node, one task, and one hour and one minute of running time.

The SLURM directives are followed by the Unix commands needed to run your program.  If your program is named, my_app  and it’s stored in your home directory, the command would be ./my_app

#!/bin/bash
#SBATCH --job-name=slurm-test    # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)

./my_app

If the SLURM script file is named my_job.slurm,  then you would submit it with the command:

sbatch ./my_job.slurm

 Download a command summary: PDF

 

More SLURM Examples

 See more example Slurm scripts here.

 

Getting Notifications from a Job

You can request that SLURM send you e-mail when a job begins and ends using the - -mail-type and - -mail-user directives.  Just add the following lines to your job with “yourNetID” replaced by your own netid.

#SBATCH --mail-type=begin        # send mail when process begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=YourNetID@princeton.edu

 

Serial and Parallel Jobs

Serial jobs only use a single processor. The previous example shows a typical SLURM serial job.  It runs one task using one node and one task per node. For information about running multiple serial tasks in a single job, see Running Serial Jobs.

Parallel jobs use more than one processor at the same time.  Two common types of parallel jobs are MPI and OpenMP.  MPI jobs run many copies of the same program across many nodes and use the Message Passing Interface (MPI) to coordinate among the copies.  More information about running MPI jobs is in Compiling and Running MPI Jobs.

OpenMP parallelizes the loops within a program.  OpenMP programs run as multiple “threads” on a single node with each thread using one core.  Information about how to run OpenMP in SLURM is in Running OpenMP jobs.

Matlab loops that use the PARFOR statement will operate in a parallel fashion much like OpenMP.  See Running Parallel Matlab Jobs for more information.

 

Using GPUs

GPU nodes are available on Tiger, Traverse and Adroit.  To use GPUs in a job,  you will need an SBATCH statement using the gres option to request that the job be run in the GPU partition and to specify the number of GPUs to allocate.  There are four GPUs on each GPU-enabled node.

#!/bin/bash
#SBATCH --job-name=poisson       # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=4               # total number of tasks across all nodes
#SBATCH --cpus-per-task=7        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --gres=gpu:4             # number of gpus per node
#SBATCH --time=01:00:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email on job start, end and fail
#SBATCH --mail-user=YourNetID@princeton.edu

module purge
module load anaconda3
conda activate myenv

srun python myscript.py

Note that your code will only be able to utilize a GPU if it has been explicitly written to do so. Furthermore, it will only be able to utilize multiple GPUs if it has written to do so.

 

Useful SLURM Commands

Command Description
sbatch <slurm_script> Submit a job (e.g., sbatch calc.cmd)
squeue Show jobs in the queue
squeue -u <username> Show jobs in the queue for a specific user (e.g., squeue -u ceisgruber)
squeue --start Report the expected start time for pending jobs
squeue -j <jobid> Show the nodes allocated to a running job
scancel <jobid> Cancel a job (e.g., scancel 2534640)
snodes Show properties of the nodes on a cluster (e.g., maximum memory)
sinfo Show how nodes are being used
sshare/sprio Show the priority assigned to jobs
smap/sview Graphical display of the queues
slurmtop Text-based view of cluster nodes
scontrol show config View default parameter settings

 

An Advanced SLURM Script

There are many ways to configure a SLURM job. Here is an advanced script:

#!/bin/bash
#SBATCH --job-name=poisson       # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=4               # total number of tasks across all nodes
#SBATCH --cpus-per-task=7        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --gres=gpu:4             # number of gpus per node
#SBATCH --time=01:00:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email on job start, end and fail
#SBATCH --mail-user=YourNetID@princeton.edu

pwd; hostname; date
env | grep SLURM | sort

ulimit -s unlimited
ulimit -c unlimited

taskset -p $$

module purge
module load intel-mpi intel
module list

srun ./a.out

date

The ulimit -s unlimited line makes the stack size the maximum possible. This is important if your code dynamically allocates a large amount of memory. By purging the modules you can be sure nothing has been unintentionally loaded. The module list statement is useful because it writes out the explicit module versions. This important if you later need to know exactly which modules you used. Lastly, all the SLURM environment variables are outputted. One can examine the values to see if they are as expected.

The default values for SLURM for a cluster are found here: /etc/slurm/slurm.conf

To see the run time limits for a cluster, look at: /etc/slurm/job_submit.lua