Spark application submission via Slurm

Apache Spark is a cluster computing framework for large-scale data processing. It is best known for its ability to cache large datasets in memory between jobs.The supported interfaces are via Scala, Python and Java.

This page provides guidelines for launching Spark on a cluster in the standalone mode using Slurm, which allows easy use of Spark on a majority of the clusters available at Princeton University.

The introductory Spark tutorial provides an introduction to the Spark framework and the submission guidelines for using YARN.

Below is an example Slurm script which can be used to launch standalone Spark on a cluster and to allocate driver and executor programs .

In a sense, the computing resources (memory and CPU) are allocated twice. First, sufficient resources for the Spark application need to be allocated via Slurm, second, spark-submit resource allocation flags need to be properly specified. The following would allocate 3 full nodes, using 4 tasks per node making use of 5 CPUs per task:

#!/bin/bash
#SBATCH -N 3
#SBATCH -t 01:00:00
#SBATCH --mem 20000
#SBATCH --ntasks-per-node 4
#SBATCH --cpus-per-task 5

 
It is recommended to use -N and - -ntasks-per-node Slurm allocation parameters instead of using mixed allocation with -n, - -ntasks-per-node or -N/-c allocation. To use cluster resources optimally, one should use up as many cores as possible on each node (a parameter dependent on a cluster). In such case, the memory requirement might not be necessary, as all of the node's CPUs are occupied.

The next part of the script starts the standalone Spark cluster and sets up working and logging directories:

module load python
module load spark/hadoop2.7/2.2.0
spark-start
echo $MASTER

Echoing spark $MASTER (the node where the driver program is going to run) is not required, it is useful, however, because this is the node where the Spark UI is going to run. The Spark UI can be accessed from the login node of the cluster using firefox, provided you have X11 forwarding enabled:

firefox --no-remote http://<master>:8080

<master> is the value of the MASTER variable above.

The last part of the script is to start a Spark application. The current example uses the Pi Monte-Carlo calculation written in PySpark, which is available here: https://github.com/apache/spark/blob/master/examples/src/main/python/pi.py

spark-submit --total-executor-cores 60 --executor-memory 5G pi.py 100

There are a few parameters to tune for a given Spark application: the number of executors, the number of cores per executor and the amount of memory per executor.

With Slurm, - -ntasks-per-node parameter specifies how many executors will be started on each node (i.e. a total of 60 executors across 3 nodes in this example). By default, Spark will use 1 core per executor, thus it is essential to specify the - -total-executor-cores, where this number cannot exceed the total number of cores available on the nodes allocated for the Spark application (60 cores resulting in 5 CPU cores per executor in this example).

Finally, the - -executor-memory parameter specifies the memory per each executor. It is 2GB by default, and cannot be greater than the RAM available on a cluster node (125 GB on most Della nodes but varies depending on a cluster).

The Spark memory tutorial discusses the memory allocation for Spark applications with Slurm and YARN and how to tune Spark applications in general.

Now that the Slurm submission script is ready, one can put it all in the slurm.cmd file:

#!/bin/bash
#SBATCH -N 3
#SBATCH -t 01:00:00
#SBATCH --mem 20000
#SBATCH --ntasks-per-node 4
#SBATCH --cpus-per-task 5
module load python
module load spark/hadoop2.7/2.2.0
spark-start
echo $MASTER
spark-submit --total-executor-cores 60 --executor-memory 5G pi.py 100

and submit from the command line by typing:

sbatch slurm.cmd

Standard Slurm commands described elsewhere can be used for job monitoring and cancelation.

The Jupyter notebook is a popular environment for data science which provides a platform for code development, plotting and full text analysis description.

Getting Jupyter notebooks to leverage the compute power of clusters is possible in 2 simple steps:

1) Allocate resources for Spark cluster using Slurm (as described in the beginning of this article). This can be done by submitting the following script with sbatch:

#!/bin/bash
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --ntasks-per-node 3
#SBATCH --cpus-per-task 2
module load spark/hadoop2.7/2.2.0
spark-start
echo $MASTER
sleep infinity


Unlike regular Slurm jobs, one would need to start up a cluster and keep it running while the interactive notebook is used. Therefore, set the -t to the maximum allowed limit so that you do not have to restart often; the --ntasks-per-node and --cpus-per-task as well as -N are up to you to configure.

This Slurm job will start the master on available node and print the master name in the slurm.out file, it will look something like:
 
Starting master on spark://ns-001:7077

and it is usually going to be a top line

2) Start Spark-enabled Jupyter (ipython) notebook (from the headnode typically).
Locate your python environment:

which python

Set environmental variables to launch Spark-enabled Jupyter notebook:

export PYSPARK_DRIVER_PYTHON=/path/to/your/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=`which python`

Next time you launch pyspark, you will be prompted to the Jupyter notebook in the browser GUI:

pyspark --master spark://ns-001:7077 --total-executor-cores 6
Make sure the number of executors and total number of cores are consistent with what you request in the Slurm submission script.
 
Often times, you would not want to output Jupyter notebook to the browser running on a headnode or a worker node on the cluster. If that is the case, modify the PYSPARK_DRIVER_PYTHON_OPTS to be:
 
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=8889 --ip=127.0.0.1"

And then, on the local machine, establish an ssh-tunnel by typing:

ssh -N -f -L localhost:8889:localhost:8889 yourusername@clustername.princeton.edu

The first option -N tells SSH that no remote commands will be executed, and is useful for port forwarding. The second option -f has the effect that SSH will go to background, so the local tunnel-enabling terminal remains usable. The last option -L lists the port forwarding configuration (remote port 8889 to local port 8889).

Note: the tunnel will be running in the background. The notebook can now be accessed from your browser at http://localhost:8889