To reiterate some quick background, to run a program on the clusters you submit a job to the scheduler (Slurm). A job consists of the the following files:

  1. your code that runs your program
  2. a separate script, known as a SLURM script, that will request the resources your job requires in terms of the amount of memory, the number of cores, number of nodes, etc.

Once your files are submitted, the scheduler (SLURM) takes care of figuring out if the resources you requested are available on the compute nodes, and if not it will start reserving those resources for you. Once resources become available, the scheduler runs your program on the compute nodes.

Below we provide an exercise for running your first job on the clusters–with a slurm script. One exercise runs a Python program, the other runs an R program.

Before working through the exercise, however, we strongly suggest that you first spend a few minutes learning about SLURM. At a minimum we suggested reading the following sections on this Slurm page:

  • Introduction
  • Useful Slurm Commands
  • Time to Solution
  • Considerations

Finally, please note that the material below assumes that you have some experience on the Linux command line. If this is not the case then see the material on our Learning Resources: the Linux Command Line page, or attend an upcoming workshop of Intro to the Linux Command Line (or a previous recording from the training archives page). You could also try using the OnDemand interface.

Running Your First Slurm Job on the Cluster

This page provides a demonstration of how to transfer files from your personal machine to Adroit and run a job on Adroit using the Slurm scheduler. There are examples for both Python and R.

1. Disconnect from Adroit

Before beginning the tutorial, make sure that you are disconnected from Adroit since all the commands during the first part are done on your local machine. If you need to disconnect then run the following command:

[aturing@adroit5 ~]$ exit

2. Obtain the Files

The first step is to store the files on the hard drive of your personal machine. There are two options for obtaining the files:

Option 1: Use git on your local machine

Run these commands in a terminal on your local machine (e.g., laptop):

# local machine (NOT ADROIT)
$ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop.git
$ cd hpc_beginning_workshop

Option 2: Download the zip file from GitHub

  1. Browse to https://github.com/PrincetonUniversity/hpc_beginning_workshop.git
  2. Click on the green "Code" button
  3. Choose "Download ZIP"
  4. Run the following commands (probably from your Downloads directory):

    # local machine (NOT ADROIT)
    $ unzip hpc_beginning_workshop-main.zip
    $ cd hpc_beginning_workshop-main

Python Script Example

I. Steps Taken On Your Local Machine

After storing the files on your local hard drive, examine them in a terminal:

$ cd python/cpu
$ cat matrix_inverse.py
$ cat job.slurm

Here are the contents of the Python script:

import numpy as np
N = 3
X = np.random.randn(N, N)
print("X =\n", X)
print("Inverse(X) =\n", np.linalg.inv(X))

Below is the Slurm script which accomplishes the following:

  1. Prescribes the resource requirements for the job (lines that start with #SBATCH)
  2. Defines the software environment (in this case, lines that start with module)
  3. Specifies the work to be carried out (which in this case is to run a Python script)
#!/bin/bash
#SBATCH --job-name=py-matinv     # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-type=fail         # send email if job fails
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load anaconda3/2023.9

python matrix_inverse.py

Use a text editor (e..g, nano, vim, emacs, Notepad, TextEdit) to replace <YourNetID> in the job.slurm file with your actual NetID. Next, while still on your laptop and using a VPN if off-campus, run the following SSH command to create a directory on Adroit (you need to replace <YourNetID> twice):

$ ssh <YourNetID>@adroit.princeton.edu "mkdir -p /scratch/network/<YourNetID>/python_test"

Note: If you are doing this exercise on Della, Stellar or Tiger then replace /scratch/network/ with /scratch/gpfs/. Transfer the Python and Slurm scripts from your laptop to Adroit using the scp (secure copy) command (or use the OnDemand interface):

$ scp matrix_inverse.py job.slurm <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/python_test

Now everything is in place on Adroit. Let's connect to the Adroit login node and submit the job to the Slurm scheduler which will run it on a compute node.

II. Steps Taken On Adroit

SSH to Adroit (VPN required if off-campus):

$ ssh <YourNetID>@adroit.princeton.edu

Change the working directory:

$ cd /scratch/network/<YourNetID>/python_test

List the files in the current directory to check that you see the Slurm script and Python script:

$ ls -l

Submit the job by running the following command:

# use a text editor like nano to replace <YourNetID> in job.slurm with your actual NetID
$ sbatch job.slurm

This will place your job in the queue. You can monitor the status of your job with "squeue -u <YourNetID>". If the ST field is PD (pending) then your job is waiting for other jobs to finish. If you do not see it in the list then it has finished. You will receive an email when the job has finished if you entered your email address in the Slurm script.

After the job runs you can view the output with the following command:

$ cat slurm-*.out

The output should be similar to the following:

X =
 [[-0.70101861  0.20261191  0.10836766]
 [ 0.86684552 -0.75347296 -0.52716024]
 [-0.02477092  0.21738458 -0.11216934]]
Inverse(X) =
 [[-2.01455049 -0.46828701  0.25452735]
 [-1.11588991 -0.82273617  2.78852862]
 [-1.71771528 -1.49105147 -3.56712226]]

If you happen to want to transfer the output file to your local machine (e.g., laptop) then run the following command in a new terminal on your local machine:

$ scp <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/python_test/slurm-\*.out .

Be sure to include the period character at the very end of the command above. This character corresponds to the current working directory on your local machine.

Tired of Duo? You can suppress Duo in a variety of ways. You should not try to do this during an in-person workshop.

R Script Example

I. Steps Taken On Your Local Machine

After obtaining the files (see above), examine the scripts in a terminal window:

$ cd serial_R
$ cat data_analysis.R
$ cat job.slurm
$ head cdc.csv

Here is the R script:

health = read.csv("cdc.csv")
print(summary(health))

Below is the Slurm script which accomplishes the following:

  1. Prescribes the resource requirements for the job (lines that start with #SBATCH)
  2. Defines the software environment (in this case, the line that start with module)
  3. Specifies the work to be carried out (which in this case is to run an R script)
#!/bin/bash
#SBATCH --job-name=R-test        # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multithread tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when process begins
#SBATCH --mail-type=fail         # send email if job fails
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load R/4.3.0  # use 4.3.1 on della

Rscript data_analysis.R

Use a text editor (e..g, nano, vim, emacs, Notepad, TextEdit) to replace <YourNetID> in the job.slurm file with your actual NetID. Here are the first few lines of the data file (cdc.csv):

genhlth,exerany,hlthplan,smoke100,height,weight,wtdesire,age,gender
good,0,1,0,70,175,175,77,m
good,0,1,1,64,125,115,33,f
good,1,1,1,60,105,105,49,f
good,1,1,0,66,132,124,42,f
very good,0,1,0,61,150,130,55,f
very good,1,1,0,64,114,114,55,f
very good,1,1,0,71,194,185,31,m
very good,0,1,0,67,170,160,45,m
good,0,1,1,65,150,130,27,f
good,1,1,0,70,180,170,44,m
...

Next, while still on your laptop and using a VPN if off-campus, run the following SSH command to create a directory on Adroit (you need to replace <YourNetID> twice):

$ ssh <YourNetID>@adroit.princeton.edu "mkdir -p /scratch/network/<YourNetID>/R_test"

Note: If you are doing this exercise on Della, Stellar or Tiger then replace /scratch/network/ with /scratch/gpfs/. Transfer the R script, Slurm script and data file from your laptop to Adroit using the scp (secure copy) command (or use the OnDemand interface):

$ scp data_analysis.R job.slurm cdc.csv <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/R_test

Now everything is in place on Adroit. Let's connect to the Adroit login node and submit the job to the Slurm scheduler which will run it on a compute node.

II. Steps Taken on Adroit

SSH to Adroit (VPN required if off-campus):

$ ssh <YourNetID>@adroit.princeton.edu

Change the working directory:

$ cd /scratch/network/<YourNetID>/R_test

List the files in the current directory (should see three files):

$ ls -l

Submit the job by running the following command:

# use a text editor like nano to replace <YourNetID> in job.slurm with your actual NetID
$ sbatch job.slurm

This will place your job in the queue. You can monitor the status of your job with "squeue -u <YourNetID>". If the ST field is PD (pending) then your job is waiting for other jobs to finish. If you do not see it in the list then it has finished. You will receive an email when the job is finished if you entered your email address in the Slurm script. After the job runs you can view the output with the following command:

$ cat slurm-*.out

Here is the expected output:

      genhlth        exerany          hlthplan         smoke100     
 excellent:4657   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 fair     :2019   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000  
 good     :5675   Median :1.0000   Median :1.0000   Median :0.0000  
 poor     : 677   Mean   :0.7457   Mean   :0.8738   Mean   :0.4721  
 very good:6972   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
                  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
     height          weight         wtdesire          age        gender   
 Min.   :48.00   Min.   : 68.0   Min.   : 68.0   Min.   :18.00   f:10431  
 1st Qu.:64.00   1st Qu.:140.0   1st Qu.:130.0   1st Qu.:31.00   m: 9569  
 Median :67.00   Median :165.0   Median :150.0   Median :43.00            
 Mean   :67.18   Mean   :169.7   Mean   :155.1   Mean   :45.07            
 3rd Qu.:70.00   3rd Qu.:190.0   3rd Qu.:175.0   3rd Qu.:57.00            
 Max.   :93.00   Max.   :500.0   Max.   :680.0   Max.   :99.00

If you happen to want to transfer the output file to your local machine (e.g., laptop) then run the following command in a new terminal on your local machine:

$ scp <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/R_test/slurm-\*.out .

Be sure to include the period character at the very end of the command above. This character corresponds to the current working directory on your local machine.

Tired of Duo? You can suppress Duo in a variety of ways. You should not try to do this during an in-person workshop.