To reiterate some quick background, to run a program on the clusters you submit a job to the scheduler (Slurm). A job consists of the the following files:your code that runs your programa separate script, known as a SLURM script, that will request the resources your job requires in terms of the amount of memory, the number of cores, number of nodes, etc.Once your files are submitted, the scheduler (SLURM) takes care of figuring out if the resources you requested are available on the compute nodes, and if not it will start reserving those resources for you. Once resources become available, the scheduler runs your program on the compute nodes.Below we provide an exercise for running your first job on the clusters–with a slurm script. One exercise runs a Python program, the other runs an R program.Before working through the exercise, however, we strongly suggest that you first spend a few minutes learning about SLURM. At a minimum we suggested reading the following sections on this Slurm page:IntroductionUseful Slurm CommandsTime to SolutionConsiderationsFinally, please note that the material below assumes that you have some experience on the Linux command line. If this is not the case then see the material on our Learning Resources: the Linux Command Line page, or attend an upcoming workshop of Intro to the Linux Command Line (or a previous recording from the training archives page). You could also try using the OnDemand interface.Running Your First Slurm Job on the ClusterThis page provides a demonstration of how to transfer files from your personal machine to Adroit and run a job on Adroit using the Slurm scheduler. There are examples for both Python and R.1. Disconnect from AdroitBefore beginning the tutorial, make sure that you are disconnected from Adroit since all the commands during the first part are done on your local machine. If you need to disconnect then run the following command:[aturing@adroit5 ~]$ exit2. Obtain the FilesThe first step is to store the files on the hard drive of your personal machine. There are two options for obtaining the files:Option 1: Use git on your local machineRun these commands in a terminal on your local machine (e.g., laptop):# local machine (NOT ADROIT) $ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop.git $ cd hpc_beginning_workshop Option 2: Download the zip file from GitHubBrowse to https://github.com/PrincetonUniversity/hpc_beginning_workshop.gitClick on the green "Code" buttonChoose "Download ZIP"Run the following commands (probably from your Downloads directory):# local machine (NOT ADROIT) $ unzip hpc_beginning_workshop-main.zip $ cd hpc_beginning_workshop-mainPython Script ExampleI. Steps Taken On Your Local MachineAfter storing the files on your local hard drive, examine them in a terminal:$ cd python/cpu $ cat matrix_inverse.py $ cat job.slurm Here are the contents of the Python script:import numpy as np N = 3 X = np.random.randn(N, N) print("X =\n", X) print("Inverse(X) =\n", np.linalg.inv(X)) Below is the Slurm script which accomplishes the following:Prescribes the resource requirements for the job (lines that start with #SBATCH)Defines the software environment (in this case, lines that start with module)Specifies the work to be carried out (which in this case is to run a Python script)#!/bin/bash #SBATCH --job-name=py-matinv # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G is default) #SBATCH --time=00:01:00 # total run time limit (HH:MM:SS) #SBATCH --mail-type=begin # send email when job begins #SBATCH --mail-type=end # send email when job ends #SBATCH --mail-type=fail # send email if job fails #SBATCH --mail-user=<YourNetID>@princeton.edu module purge module load anaconda3/2023.9 python matrix_inverse.py Use a text editor (e..g, nano, vim, emacs, Notepad, TextEdit) to replace <YourNetID> in the job.slurm file with your actual NetID. Next, while still on your laptop and using a VPN if off-campus, run the following SSH command to create a directory on Adroit (you need to replace <YourNetID> twice):$ ssh <YourNetID>@adroit.princeton.edu "mkdir -p /scratch/network/<YourNetID>/python_test" Note: If you are doing this exercise on Della, Stellar or Tiger then replace /scratch/network/ with /scratch/gpfs/. Transfer the Python and Slurm scripts from your laptop to Adroit using the scp (secure copy) command (or use the OnDemand interface):$ scp matrix_inverse.py job.slurm <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/python_test Now everything is in place on Adroit. Let's connect to the Adroit login node and submit the job to the Slurm scheduler which will run it on a compute node.II. Steps Taken On AdroitSSH to Adroit (VPN required if off-campus):$ ssh <YourNetID>@adroit.princeton.edu Change the working directory:$ cd /scratch/network/<YourNetID>/python_test List the files in the current directory to check that you see the Slurm script and Python script:$ ls -l Submit the job by running the following command:# use a text editor like nano to replace <YourNetID> in job.slurm with your actual NetID $ sbatch job.slurm This will place your job in the queue. You can monitor the status of your job with "squeue -u <YourNetID>". If the ST field is PD (pending) then your job is waiting for other jobs to finish. If you do not see it in the list then it has finished. You will receive an email when the job has finished if you entered your email address in the Slurm script.After the job runs you can view the output with the following command:$ cat slurm-*.outThe output should be similar to the following:X = [[-0.70101861 0.20261191 0.10836766] [ 0.86684552 -0.75347296 -0.52716024] [-0.02477092 0.21738458 -0.11216934]] Inverse(X) = [[-2.01455049 -0.46828701 0.25452735] [-1.11588991 -0.82273617 2.78852862] [-1.71771528 -1.49105147 -3.56712226]] If you happen to want to transfer the output file to your local machine (e.g., laptop) then run the following command in a new terminal on your local machine:$ scp <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/python_test/slurm-\*.out .Be sure to include the period character at the very end of the command above. This character corresponds to the current working directory on your local machine.Tired of Duo? You can suppress Duo in a variety of ways. You should not try to do this during an in-person workshop.R Script ExampleI. Steps Taken On Your Local MachineAfter obtaining the files (see above), examine the scripts in a terminal window:$ cd serial_R $ cat data_analysis.R $ cat job.slurm $ head cdc.csv Here is the R script:health = read.csv("cdc.csv") print(summary(health)) Below is the Slurm script which accomplishes the following:Prescribes the resource requirements for the job (lines that start with #SBATCH)Defines the software environment (in this case, the line that start with module)Specifies the work to be carried out (which in this case is to run an R script)#!/bin/bash #SBATCH --job-name=R-test # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multithread tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G is default) #SBATCH --time=00:01:00 # total run time limit (HH:MM:SS) #SBATCH --mail-type=begin # send email when process begins #SBATCH --mail-type=fail # send email if job fails #SBATCH --mail-type=end # send email when job ends #SBATCH --mail-user=<YourNetID>@princeton.edu module purge module load R/4.3.0 # use 4.3.1 on della Rscript data_analysis.R Use a text editor (e..g, nano, vim, emacs, Notepad, TextEdit) to replace <YourNetID> in the job.slurm file with your actual NetID. Here are the first few lines of the data file (cdc.csv):genhlth,exerany,hlthplan,smoke100,height,weight,wtdesire,age,gender good,0,1,0,70,175,175,77,m good,0,1,1,64,125,115,33,f good,1,1,1,60,105,105,49,f good,1,1,0,66,132,124,42,f very good,0,1,0,61,150,130,55,f very good,1,1,0,64,114,114,55,f very good,1,1,0,71,194,185,31,m very good,0,1,0,67,170,160,45,m good,0,1,1,65,150,130,27,f good,1,1,0,70,180,170,44,m ... Next, while still on your laptop and using a VPN if off-campus, run the following SSH command to create a directory on Adroit (you need to replace <YourNetID> twice):$ ssh <YourNetID>@adroit.princeton.edu "mkdir -p /scratch/network/<YourNetID>/R_test" Note: If you are doing this exercise on Della, Stellar or Tiger then replace /scratch/network/ with /scratch/gpfs/. Transfer the R script, Slurm script and data file from your laptop to Adroit using the scp (secure copy) command (or use the OnDemand interface):$ scp data_analysis.R job.slurm cdc.csv <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/R_test Now everything is in place on Adroit. Let's connect to the Adroit login node and submit the job to the Slurm scheduler which will run it on a compute node.II. Steps Taken on AdroitSSH to Adroit (VPN required if off-campus):$ ssh <YourNetID>@adroit.princeton.edu Change the working directory:$ cd /scratch/network/<YourNetID>/R_test List the files in the current directory (should see three files):$ ls -l Submit the job by running the following command:# use a text editor like nano to replace <YourNetID> in job.slurm with your actual NetID $ sbatch job.slurm This will place your job in the queue. You can monitor the status of your job with "squeue -u <YourNetID>". If the ST field is PD (pending) then your job is waiting for other jobs to finish. If you do not see it in the list then it has finished. You will receive an email when the job is finished if you entered your email address in the Slurm script. After the job runs you can view the output with the following command:$ cat slurm-*.outHere is the expected output: genhlth exerany hlthplan smoke100 excellent:4657 Min. :0.0000 Min. :0.0000 Min. :0.0000 fair :2019 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000 good :5675 Median :1.0000 Median :1.0000 Median :0.0000 poor : 677 Mean :0.7457 Mean :0.8738 Mean :0.4721 very good:6972 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000 height weight wtdesire age gender Min. :48.00 Min. : 68.0 Min. : 68.0 Min. :18.00 f:10431 1st Qu.:64.00 1st Qu.:140.0 1st Qu.:130.0 1st Qu.:31.00 m: 9569 Median :67.00 Median :165.0 Median :150.0 Median :43.00 Mean :67.18 Mean :169.7 Mean :155.1 Mean :45.07 3rd Qu.:70.00 3rd Qu.:190.0 3rd Qu.:175.0 3rd Qu.:57.00 Max. :93.00 Max. :500.0 Max. :680.0 Max. :99.00 If you happen to want to transfer the output file to your local machine (e.g., laptop) then run the following command in a new terminal on your local machine:$ scp <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/R_test/slurm-\*.out .Be sure to include the period character at the very end of the command above. This character corresponds to the current working directory on your local machine.Tired of Duo? You can suppress Duo in a variety of ways. You should not try to do this during an in-person workshop.