MATLAB on the HPC Clusters

OUTLINE

 

Running MATLAB via Your Web Browser

If you are new to high-performance computing then you will find that the simplest way to use MATLAB on the HPC clusters is through the Open OnDemand web interface. You will need to use a VPN to connect from off-campus (GlobalProtect VPN is recommended). If you have an account on Adroit or Della then browse to https://myadroit.princeton.edu or https://mydella.princeton.edu. We recommend the GlobalProtect VPN. If you need an account on Adroit then complete this form.

To begin a session, click on "Interactive Apps" and then "MATLAB". You will need to choose the "MATLAB version", "Number of hours" and "Number of cores". Set "Number of cores" to 1 unless you are sure that your script has been explicitly parallelized using, for example, the Parallel Computing Toolbox (see below). Click "Launch" and then when your session is ready click "Launch MATLAB". Note that the more resources you request, the more you will have to wait for your session to become available.

OnDemand MATLAB

 

Gurobi

The Gurobi optimization library is required for certain MATLAB packages. If you need to use Gurobi within your OnDemand MATLAB session then enter the environment module name (e.g., gurobi/9.0.1) in the field labeled "Additional environment modules to load" when creating the session. This will enable the software and allow the license to be found. Start the session by setting up the Gurobi MATLAB interface at the MATLAB prompt:

>> cd /usr/licensed/gurobi/9.0.1/linux64/matlab/
>> gurobi_setup

 

 

Submitting Batch Jobs to the Slurm Scheduler

The web interface described above is good for interactive work. One can also submit MATLAB batch jobs to the Slurm scheduler. This applies to Adroit, Della and TigerGPU. A job consists of two pieces: (1) a MATLAB script and (2) a Slurm script that specifies the needed resources, sets the environment and lists the commands to be run. Learn more about Slurm or see the example below.

 

Running a Serial MATLAB Job

A serial MATLAB job is one that requires only a single CPU-core. Here is an example of a trivial, one-line serial MATLAB script (hello_world.m):

fprintf('Hello world.\n')

The Slurm script (job.slurm) below can be used for serial jobs:

#!/bin/bash
#SBATCH --job-name=matlab        # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email on job start, end and fault
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load matlab/R2019a

matlab -singleCompThread -nodisplay -nosplash -r hello_world

By invoking MATLAB with -singleCompThread -nodisplay -nosplash, the GUI is suppressed as is the creation of multiple threads. To run the MATLAB script, simply submit the job to the scheduler with the following command:

$ sbatch job.slurm

After the job completes, view the output with cat slurm-*:

...
Hello world.

Use squeue -u $USER to monitor the progress of queued jobs. To run the example above on Della, for example, carry out these commands:

$ ssh <YourNetID>@della.princeton.edu
$ cd /scratch/gpfs/<YourNetID>
$ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop
$ cd hpc_beginning_workshop/RC_example_jobs/matlab/serial
# edit email address in job.slurm
$ sbatch job.slurm

 

Choosing a MATLAB Version

Run the command below to see the available MATLAB versions. For example, on Della:

$ module avail matlab
------------ /usr/licensed/Modules/modulefiles ------------
matlab/R2010a          matlab/R2014b          matlab/R2018b
matlab/R2010b          matlab/R2015a          matlab/R2019a
matlab/R2011a          matlab/R2015b          matlab/R2019b
matlab/R2011b          matlab/R2016a          matlab/R2020a
matlab/R2012a          matlab/R2016b          matlab/R2020b
matlab/R2013a          matlab/R2017a          matlab/R2021a
matlab/R2013b          matlab/R2017b(default)
matlab/R2014a          matlab/R2018a

In your Slurm script you must choose a specific version, for example: module load matlab/R2019b.

 

Running a Multi-threaded MATLAB Job with the Parallel Computing Toolbox

Most of the time, running MATLAB in single-threaded mode (as described above) will meet your needs. However, if your code makes use of the Parallel Computing Toolbox (e.g., parfor) or you have intense computations that can benefit from the built-in multi-threading provided by MATLAB's BLAS implementation, then you can run in multi-threaded mode. One can use up to all the CPU-cores on a single node in this mode. Multi-node jobs are not possible with the version of MATLAB that we have so your Slurm script should always use #SBATCH --nodes=1. Here is an example from MathWorks of using multiple cores (for_loop.m):

poolobj = parpool;
fprintf('Number of workers: %g\n', poolobj.NumWorkers);

tic
n = 200;
A = 500;
a = zeros(n);
parfor i = 1:n
    a(i) = max(abs(eig(rand(A))));
end
toc

The Slurm script (`job.slurm`) below can be used for this case:

#!/bin/bash
#SBATCH --job-name=parfor        # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=4        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G is default)
#SBATCH --time=00:00:30          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email on job start, end and fault
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load matlab/R2019a

matlab -nodisplay -nosplash -r for_loop

Note that -singleCompThread does not appear in the Slurm script in contrast to the serial case. One must tune the value of --cpus-per-task for optimum performance. Use the smallest value that gives you a significant performance boost because the more resources you request the longer your queue time will be. The example above runs in 30 s with 1 core and 9 s with 4 cores on Adroit.

Overriding the 12 core limit

By default MATLAB will restrict you to 12 worker threads. You can override this when making the parallel pool with the following line, for example, with 24 threads:

poolobj = parpool('local', 24);

If you use more than one thread then make sure that your code can take advantage of all the CPU-cores. The amount of time that a job waits in the queue is proportional to the requested resources. Furthermore, your fairshare value is decreased in proportion to the requested resources. To find the optimal number of CPU-cores for a MATLAB job see the "Multithreading" section on Chossing the Number of Nodes, CPU-cores and GPUs.

 

How Do I Know If My MATLAB Code is Parallelized?

A parfor statement is a clear indication of a parallelized MATLAB code. However, there are cases when the parallelization is not obvious. One example would be a code that uses linear algebra operations such as matrix multiplication. In this case MATLAB will use the BLAS library which offers multithreaded routines.

There are two common ways to deteremine whether or not a MATLAB code can take advantage of parallelism without knowing anything about the code. The first to is run the code using 1 CPU-core and then do a second run using, say, 4 CPU-cores. Look to see if there is a significant difference in the execution time of the two codes. The second method is to launch the job using, say, 4 CPU-cores then ssh to the compute node where the job is running and use htop -u $USER to inspect the CPU usage. To get the name of the compute node where your job is running use the following command:

$ squeue -u $USER

The rightmost column labeled "NODELIST(REASON)" gives the name of the node where your job is running. SSH to this node, for example:

$ ssh della-r3c1n14

Once on the compute node, run htop -u $USER. If your job is running in parallel you should see a process using much more than 100% in the %CPU column. For 4 CPU-cores this number would ideally be 400%.

 

Running MATLAB on Nobel

The Nobel cluster is a shared system without a job scheduler. Because of this, users are not allowed to run MATLAB in multi-threaded mode. The first step in using MATLAB on Nobel is choosing the version. Run module avail matlab to see the choices. Load a module with, e.g., module load matlab/R2019b. After loading a MATLAB module, to run MATLAB interactively on the script myscript.m:

$ matlab -singleCompThread -nodisplay -nosplash -r myscript

If you would like to use the GUI then you must first connect using ssh -X. In that case the command is:

$ matlab -singleCompThread -r myscript

If you are on a Windows machine then consider reading Run MATLAB from Nobel Cluster for Windows Computers.

Using the MATLAB GUI on Tigressdata

In addition to the web interfaces on MyAdroit and MyDella, one can also launch MATLAB with its GUI on Tigressdata. Tigressdata is ideal for data post-processing and visualization. You can access your files on the different filesystems using these paths: /tiger/scratch/gpfs/<YourNetID>, /della/scratch/gpfs/<YourNetID>, /perseus/scratch/gpfs/<YourNetID>, /tigress and /projects. Mac users will need to have XQuartz installed while Windows users should install MobaXterm (Home Edition). Visit the OIT Tech Clinic for assistance with installing, configuring and using these tools. To run MATLAB interactively with its graphical user interface:

$ ssh -X <YourNetID>@tigressdata.princeton.edu
$ module load matlab/R2019a
$ matlab

It can take a minute or more for the GUI to appear and for initialization to complete. To work interactively without the GUI:

$ ssh <YourNetID>@tigressdata.princeton.edu
$ module load matlab/R2019a
$ matlab
>>

Note that one can use the procedures above on the HPC clusters (e.g., Della) but only for non-intensive work since the head node is shared by all users of the cluster.

 

MATLAB is Not Allowed on TigerCPU or Stellar

TigerCPU is designed for parallel, multi-node jobs. MATLAB cannot be used across multiple nodes so it is not allowed. If you try to run a MATLAB job on TigerCPU you will encounter the following error message:

License checkout failed.
License Manager Error -15
MATLAB is unable to connect to the license server. 
Check that the license manager has been started, and that the MATLAB client machine can communicate
with the license server.

Troubleshoot this issue by visiting: 
https://www.mathworks.com/support/lme/R2018b/15

Diagnostic Information:
Feature: MATLAB 
License path: /home/jdh4/.matlab/R2018b_licenses:/usr/licensed/matlab-R2018b/licenses/license.dat:/usr/licensed/ma
tlab-R2018b/licenses/network.lic 
Licensing error: -15,570. System Error: 115

You will need to carry out the work on another cluster such as Della or if your script can be written to use a GPU then you can use TigerGPU. To get started with MATLAB and GPUs see below.

Like TigerCPU, Stellar was also designed for parallel, multi-node jobs. While MATLAB is available on the head node of Stellar for very light work, it is not available on the Stellar compute nodes.

 

MATLAB is Not Available on Traverse

MathWorks does not produce a version of MATLAB that is compatible with the POWER architecture of Traverse.

 

Running MATLAB on GPUs

Many routines in MATLAB have been written to run on a GPU. Below is a MATLAB script (svd_matlab.m) that performs a matrix decomposition using a GPU:

gpu = gpuDevice();
fprintf('Using a %s GPU.\n', gpu.Name);
disp(gpuDevice);

X = gpuArray([1 0 2; -1 5 0; 0 3 -9]);
whos X;
[U,S,V] = svd(X)
fprintf('trace(S): %f\n', trace(S))
quit;

The Slurm script (job.slurm) below can be used for this case:

#!/bin/bash
#SBATCH --job-name=matlab-svd    # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --gres=gpu:1             # number of gpus per node
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-user=@princeton.edu

module purge
module load matlab/R2019a

matlab -singleCompThread -nodisplay -nosplash -r svd_matlab

In the above Slurm script, notice the new line: #SBATCH --gres=gpu:1.

The job can be submitted to the scheduler with:

$ sbatch job.slurm

Be sure that your MATLAB code is able to use a GPU before submitting your job. See this getting started guide on MATLAB and GPUs.

MyAdroit

To request a V100 GPU via the MyAdroit web portal, add the following to the "Extra slurm options" field:

--gres=gpu:1

Note that if all the GPUs are in use then you will have to wait. During busy times this may take hours or days. To check what is available, from the OnDemand main menu, click on "Clusters" and then "_Adroit Cluster Shell Access". From the black terminal screen run this command: ssh adroit-h11g1. Then run gpustat. There are four V100 GPUs on that node. If you see usernames associated with each of them then the node is occupied. Type exit to leave the compute node. Note that Adroit is a training cluster so it is not intended for lengthy production jobs.

 

Running MATLAB in Python

You can embed MATLAB inside a Python script using the MATLAB Engine for Python. There are two sets of the directions below. The first shows how to do the installation in a standalone manner while the second uses a Conda environment.

$ module load anaconda3/2020.11 matlab/R2019a
$ cd /usr/licensed/matlab-R2019a/extern/engines/python
$ python setup.py build --build-base=$HOME/.cache install --prefix=$HOME/software
$ export PYTHONPATH=$HOME/software/lib/python3.7/site-packages:$PYTHONPATH
$ python
>>> import matlab.engine
>>> eng = matlab.engine.start_matlab()

You will need to set the PYTHONPATH as above in your Slurm script in addition to including the two modules.

You can also install the MATLAB Engine for Python into a Conda environment:

$ module load anaconda3/2020.11 matlab/R2019a
$ conda create --name myenv python=3.7 pandas
$ conda activate myenv
$ cd /usr/licensed/matlab-R2019a/extern/engines/python
$ python setup.py build --build-base=$HOME/.cache install --prefix=$HOME/.conda/envs/myenv
$ python
>>> import matlab.engine
>>> eng = matlab.engine.start_matlab()

Despite working with MATLAB 2019, you will see version 2018 mentioned. This can be ignored.

Your Slurm script in this case will need to include:

module load anaconda3/2020.11 matlab/R2019a
conda activate myenv

 

MATLAB and Java

MATLAB uses some functionality from Java. To see which implementation of Java it is using:

>> version -java
ans = 'Java 1.8.0_181-b13 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode'

Additional information can be obtained by running: javaclasspath()

To directly import code: import java.util.ArrayList

If you are using Java functionality in MATLAB be sure to eliminate -nojvm in your Slurm script.

 

Where to Store Your Files

You should run your jobs out of /scratch/gpfs/ on the HPC clusters. These filesystems are very fast and provide vast amounts of storage. Do not run jobs out of /tigress or /projects. That is, you should never be writing the output of actively running jobs to these filesystems. /tigress and /projects are slow and should only be used for backing up the files that you produce on /scratch/gpfs. Your /home directory on all clusters is small and it should only be used for storing source code and executables. The commands below give you an idea of how to properly run a MATLAB job:

$ ssh <YourNetID>@della.princeton.edu
$ cd /scratch/gpfs/<YourNetID>
$ mkdir myjob && cd myjob
# put MATLAB script and Slurm script in myjob
$ sbatch job.slurm

If the run produces data that you want to backup then copy or move it to /tigress:

$ cp -r /scratch/gpfs/<YourNetID>/myjob /tigress/<YourNetID>

For large transfers consider using rsync instead of cp. Most users only do back-ups to /tigress every week or so. While /scratch/gpfs is not backed-up, files are never removed. However, important results should be transferred to /tigress or /projects. The diagram below gives an overview of the filesystems:

HPC clusters and the filesystems that are available to each. Users should write job output to /scratch/gpfs.

 

FAQ

1. How to pass arguments to a MATLAB function within a SLURM script?

Let us use the following MATLAB script as a example:

function print_values(array, a, b)
array
a
b

It simply prints the value of the 3 arguments to the screen. Let us save this MATLAB function in a file called print_values.m. Now here is a SLURM script that shows you how to run using a SLURM script that uses a SLURM job array:

#!/bin/bash
#SBATCH --job-name=matlab        # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=1G         # memory per cpu-core (4G is default)
#SBATCH --time=00:00:30          # total run time limit (HH:MM:SS)
#SBATCH --array=0-4              # job array with index values 0, 1, 2, 3, 4
#SBATCH -o matlab%a.out          # stdout is redirected to that file
#SBATCH -e matlab%a.err          # stderr is redirected to that file
#SBATCH --mail-type=all
#SBATCH --mail-user=<YourNetID>@princeton.edu

STRING="'I am a string'"
GOLDEN=1.618033988749895

module purge
module load matlab/R2019a

matlab -nodesktop -nosplash -r "print_values($SLURM_ARRAY_TASK_ID, $STRING, $GOLDEN)"

Note that this simple scripts demonstrates how to pass three types of arguments: 1. A string, STRING, note that the double quotes (") are part of the bash syntax and the single quotes ( ' ) are part of the MATLAB syntax. 2. A real number, GOLDEN, 3. A SLURM environment variable, SLURM_ARRAY_TASK_ID, which happens to be an integer. Assuming that this script is saved in a file called batch.sh, we submit to the scheduler with the command: The job can be submitted to the scheduler with:

$ sbatch batch.sh

Once the script is executed, 8 files should be created in your directory: `matlabN.out` and `matlabN.err` where N=0,...,3 because we asked for a job array of 4 elements.

2. Why do my MATLAB jobs using the Parallel Computing Toolbox fail on launch?

I have submitted multiple job where the MATLAB script contains this snippet:

if isempty(gcp('nocreate'))
  pc = parcluster('local');
  parpool(pc,10)
end

Some of the jobs finish successfully but others fail with:

Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line 676) Failed to start pool. Error using parallel.Job/preSubmit (line 581) Unable to read MAT-file $HOME/.matlab/local_cluster_jobs/R2018a/Job376.in.mat. File might be corrupt.

It appears that two jobs are launching simultaneously and attempting to create this .mat file at the same time. How do I avoid this? The solution is to set the output to /tmp. Every job has a private /tmp directory when running so you can redirect this file to there. The code snippet becomes:

if isempty(gcp('nocreate'))
  pc = parcluster('local');
  pc.JobStorageLocation = '/tmp/';
  parpool(pc,10)
end

3. How can I run MATLAB when I don't have internet access? The solution is to install a local license. Here is how to do it for macOS: http://www.princeton.edu/software/licenses/software/matlab/R2017a_LocalLicM.xml For other operating systems, see: http://www.princeton.edu/software/licenses/software/matlab/

4. Why do I get an error about licensing when I try to run MATLAB on TigerCPU?

TigerCPU is reserved for large parallel jobs. MATLAB jobs can only use a single node so they are not allowed to run on TigerCPU. You will either need to run the job on another cluster or if your code can make use of a GPU then TigerGPU can be used.

5. I installed MATLAB through the university on my laptop. How do I update the license?

Please see this OIT page.

6. What does this error mean: "/usr/licensed/matlab-R2019a/bin/matlab: fork: retry: Resource temporarily unavailable"?

It may be that you have too many processes running. Try killing some of your processes and retrying. To see you running processes use: ps -ef | grep <YourNetID>

 

Getting Help

If you encounter any difficulties while running MATLAB on the HPC clusters then please send an email to cses@princeton.edu or attend a help session.