First Slurm Job

Before working through the exercise on this page, we suggest that you spend a few minutes learning about Slurm. Start with the Introduction, Useful Slurm Commands, Time to Solution and Considerations sections. The material below assumes that you have some experience on the Linux command line. If this is not the case then see Intro to the Linux Command Line.

Running Your First Slurm Job on the Cluster

This page provides a demonstration of how to transfer files to Adroit from your personal machine and run a job on Adroit using the Slurm scheduler. There are examples for both Python and R. To obtain the example materials, run this command in a terminal on your local machine (e.g., laptop):

# on your laptop
$ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop.git

 

Python Script Example

I. Steps Taken On Your Local Machine

Aftering cloning the repo, examine the files in a terminal on your laptop:

$ cd hpc_beginning_workshop/09_slurm/python_example
$ cat matrix_inverse.py
$ cat job.slurm

Here are the contents of the Python script:

import numpy as np
N = 3
X = np.random.randn(N, N)
print("X =\n", X)
print("Inverse(X) =\n", np.linalg.inv(X))

Below is the Slurm script which accomplishes the following:

  1. Prescribes the resource requirements for the job (lines that start with #SBATCH)
  2. Defines the software environment (in this case, lines that start with module)
  3. Specifies the work to be carried out (which in this case is to run a Python script)
#!/bin/bash
#SBATCH --job-name=py-matinv     # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-type=fail         # send email if job fails
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load anaconda3/2020.11

python matrix_inverse.py

Use a text editor to replace <YourNetID> in the job.slurm file with your actual NetID. Next, while still on your laptop and using a VPN if off-campus, run the following SSH command to create a directory on Adroit (you need to replace <YourNetID> twice):

$ ssh <YourNetID>@adroit.princeton.edu "mkdir -p /scratch/network/<YourNetID>/python_test"

Note: If you are doing this exercise on Della, Stellar or Tiger then replace /scratch/network/ with /scratch/gpfs/. Transfer the Python and Slurm scripts from your laptop to Adroit using the scp (secure copy) command:

$ scp matrix_inverse.py job.slurm <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/python_test

Now everything is in place on Adroit. Let's connect to the Adroit login node and submit the job.

II. Steps Taken On Adroit

SSH to Adroit (VPN required if off-campus):

$ ssh <YourNetID>@adroit.princeton.edu

Change the working directory:

$ cd /scratch/network/<YourNetID>/python_test

List the files in the current directory to check that you see the Slurm script and Python script:

$ ls -l

Submit the job by running the following command:

# use a text editor to replace <YourNetID> in job.slurm with your actual NetID
$ sbatch job.slurm

This will place your job in the queue. You can monitor the status of your job with "squeue -u <YourNetID>". If the ST field is PD (pending) then your job is waiting for other jobs to finish. If you do not see it in the list then it has finished. You will receive an email when the job has finished if you entered your email address in the Slurm script.

After the job runs you can view the output with the following command:

$ cat slurm-*.out

The output should be similar to the following:

X =
 [[-0.70101861  0.20261191  0.10836766]
 [ 0.86684552 -0.75347296 -0.52716024]
 [-0.02477092  0.21738458 -0.11216934]]
Inverse(X) =
 [[-2.01455049 -0.46828701  0.25452735]
 [-1.11588991 -0.82273617  2.78852862]
 [-1.71771528 -1.49105147 -3.56712226]]

If you happen to want to transfer the output file to your local machine (e.g., laptop) then run the following command in a new terminal on your local machine:

$ scp <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/python_test/slurm-*.out .

Be sure to include the period character at the very end of the command above. This character corresponds to the current working directory on your local machine.

Tired of Duo? You can suppress Duo in a variety of ways. You should not try to do this during an in-person workshop.

 

R Script Example

I. Steps Taken On Your Local Machine

After cloning the repo (see "git clone" above), examine the scripts in a terminal window:

$ cd hpc_beginning_workshop/09_slurm/R_example
$ cat data_analysis.R
$ cat job.slurm
$ head cdc.csv

Here is the R script:

health = read.csv("cdc.csv")
print(summary(health))

Below is the Slurm script which accomplishes the following:

  1. Prescribes the resource requirements for the job (lines that start with #SBATCH)
  2. Defines the software environment (in this case, the line that start with module)
  3. Specifies the work to be carried out (which in this case is to run an R script)
#!/bin/bash
#SBATCH --job-name=R-test        # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multithread tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin        # send email when process begins
#SBATCH --mail-type=fail         # send email if job fails
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge

Rscript data_analysis.R

Use a text editor to replace <YourNetID> in the job.slurm file with your actual NetID. Here are the first few lines of the data file (cdc.csv):

genhlth,exerany,hlthplan,smoke100,height,weight,wtdesire,age,gender
good,0,1,0,70,175,175,77,m
good,0,1,1,64,125,115,33,f
good,1,1,1,60,105,105,49,f
good,1,1,0,66,132,124,42,f
very good,0,1,0,61,150,130,55,f
very good,1,1,0,64,114,114,55,f
very good,1,1,0,71,194,185,31,m
very good,0,1,0,67,170,160,45,m
good,0,1,1,65,150,130,27,f
good,1,1,0,70,180,170,44,m
...

Next, while still on your laptop and using a VPN if off-campus, run the following SSH command to create a directory on Adroit (you need to replace <YourNetID> twice):

$ ssh <YourNetID>@adroit.princeton.edu "mkdir -p /scratch/network/<YourNetID>/R_test"

Note: If you are doing this exercise on Della, Stellar or Tiger then replace /scratch/network/ with /scratch/gpfs/. Transfer the R script, Slurm script and data file from your laptop to Adroit using the scp (secure copy) command:

$ scp data_analysis.R job.slurm cdc.csv <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/R_test

Now everything is in place on Adroit. Let's connect to the Adroit login node and submit the job.

II. Steps Taken on Adroit

SSH to Adroit (VPN required if off-campus):

$ ssh <YourNetID>@adroit.princeton.edu

Change the working directory:

$ cd /scratch/network/<YourNetID>/R_test

List the files in the current directory (should see three files):

$ ls -l

Submit the job by running the following command:

$ sbatch job.slurm

This will place your job in the queue. You can monitor the status of your job with "squeue -u <YourNetID>". If the ST field is PD (pending) then your job is waiting for other jobs to finish. If you do not see it in the list then it has finished. You will receive an email when the job is finished if you entered your email address in the Slurm script. After the job runs you can view the output with the following command:

$ cat slurm-*.out

Here is the expected output:

      genhlth        exerany          hlthplan         smoke100     
 excellent:4657   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 fair     :2019   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000  
 good     :5675   Median :1.0000   Median :1.0000   Median :0.0000  
 poor     : 677   Mean   :0.7457   Mean   :0.8738   Mean   :0.4721  
 very good:6972   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
                  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
     height          weight         wtdesire          age        gender   
 Min.   :48.00   Min.   : 68.0   Min.   : 68.0   Min.   :18.00   f:10431  
 1st Qu.:64.00   1st Qu.:140.0   1st Qu.:130.0   1st Qu.:31.00   m: 9569  
 Median :67.00   Median :165.0   Median :150.0   Median :43.00            
 Mean   :67.18   Mean   :169.7   Mean   :155.1   Mean   :45.07            
 3rd Qu.:70.00   3rd Qu.:190.0   3rd Qu.:175.0   3rd Qu.:57.00            
 Max.   :93.00   Max.   :500.0   Max.   :680.0   Max.   :99.00

If you happen to want to transfer the output file to your local machine (e.g., laptop) then run the following command in a new terminal on your local machine:

$ scp <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/R_test/slurm-*.out .

Be sure to include the period character at the very end of the command above. This character corresponds to the current working directory on your local machine.

Tired of Duo? You can suppress Duo in a variety of ways. You should not try to do this during an in-person workshop.