Jupyter on the HPC Clusters

OUTLINE

 

Running Jupyter via Your Web Browser

Research Computing provides multiple web portals for running Jupyter including MyAdroit and MyDella. You will need to use a VPN to connect from off-campus (GlobalProtect VPN is recommended). If you have an account on Adroit or Della then browse to https://myadroit.princeton.edu or https://mydella.princeton.edu. If you need an account on Adroit then complete this form.

To begin a session, click on "Interactive Apps" and then "Jupyter". You will need to choose the "Number of hours", "Number of cores" and "Memory allocated". Set "Number of cores" to 1 unless you are sure that your script has been explicitly parallelized. Click "Launch" and then when your session is ready click "Connect to Jupyter". Note that the more resources you request, the more you will have to wait for your session to become available. When your session starts, click on "New" in the upper right and choose a kernel such as "Python 3.8 [anaconda3/2020.7]" from the drop-down menu.

OnDemand Jupyter

Internet Access is Not Available During Running Sessions

Jupyter sessions all run on the compute nodes which do not have Internet access. This means that you will not be able to download files, clone a repo from GitHub, install packages, etc. You will need to perform these operations on the login node before starting the session. To do this, in the main OnDemand menu, click on "Clusters" and then "<Name> Cluster Shell Access". This will present you with a black terminal on the login node (e.g., adroit4) where you can run commands which need Internet access. Any files or packages that you download while on the login node will be available on the compute nodes in your OnDemand session.

Using Conda Environments on MyAdroit and MyDella

First, make sure that you do not have any OnDemand sessions running when you make the Conda environment. New environments will not be found by running sessions. Next, create a Conda environment on the login node (see our Python page for details). For example, for Adroit/MyAdroit:

$ ssh <YourNetID>@adroit.princeton.edu  # or Cluster Shell Access (see above)
$ module load anaconda3/2020.11
$ conda create --name tf-cpu tensorflow pandas matplotlib
$ exit

Go to MyAdroit and launch a Jupyter notebook by entering the "Number of hours" and so on and then click on "Launch". When your session is ready click "Connect to Jupyter". On the next screen, choose "New" in the upper right and then tf-cpu in the drop-down menu. Your tf-cpu environment will be active when the notebook appears. If you are using a Python 3 notebook, to see the packages in your Conda environment, run this command in a cell (include the percent sign):

%conda list

Note that Jupyter notebooks via OnDemand run on the compute nodes where Internet access is disabled. This means that you will not be able to install packages or download files. These actions must be done on the head node (i.e., adroit4 or della5). To install additional packages on Adroit, for example:

$ ssh <YourNetID>@adroit.princeton.edu  # or Cluster Shell Access
$ module load anaconda3/2020.11
$ conda activate <your-environment>
$ conda install <another-package-1> <another-package-2>
$ conda deactivate
$ exit

After you install the additional packages go to MyAdroit and they will be available. The same procedure can be used for Della/MyDella. For some packages you will need to add the conda-forge channel or even perform the installation using pip as the last step. See Python on the HPC Clusters for additional information.

Make sure you have enough disk space by running the checkquota command. An error message like "[Errno 122] Disk quota exceeded" is a sure sign that you are over quota.

Requesting a GPU on MyAdroit

From the OnDemand main menu, choose "Interactive Apps" then "Jupyter". You will then need to choose the "Number of hours", "Number of cores" and so on. The last field on this page is "Extra slurm options". To request a V100 GPU enter this:

--gres=gpu:1

Note that if all the GPUs are in use then you will have to wait. To check what is available, from the OnDemand main menu, click on "Clusters" and then "_Adroit Cluster Shell Access". From the black terminal screen run this command: ssh adroit-h11g1. Then run gpustat. There are four V100 GPUs on that node. If you see usernames associated with each of them then the node is occupied. Type exit to leave the compute node and return to the login node. See more information about V100 GPUs.

Requesting a GPU on MyDella

From the OnDemand main menu, choose "Interactive Apps" then "Jupyter". You will then need to choose the "Number of hours", "Number of cores" and so on. Leave "Node type" as "any". The last field on this page is "Extra slurm options" where you should enter the following:

--gres=gpu:1

Note that if all the GPUs are in use then you will have to wait.

Most users should create and use a Conda environment following the directions above. If you need to work with specific environment modules then choose "custom" under "Anaconda3 version used for starting up jupyter interface". This will present two new fields. Specify the needed modules in the "Modules to load instead of default anaconda3 modules" field. For instance, one could specify:

anaconda3/2020.11 cudatoolkit/11.3

 

jupyter.rc

For those with an account on Tiger or Della another possibility is https://jupyter.rc.princeton.edu which is a standalone node designed for running interactive Jupyter notebooks. Note that you will need to use a VPN to connect from off-campus. Unfortunately, custom Conda environments are not supported on this machine. Additionally, users need to choose one of four job profiles and each contains a fairly old version of anaconda3. Unike MyAdroit and MyDella, jupyter.rc mounts each of the /scratch/gpfs filesystems of Tiger and Della as well as /tigress and /projects. This makes it easy to analyze data that has been generated on these clusters. For the Tiger filesystem, for instance, use the path /tiger/scratch/gpfs/<YourNetID>. Note that jupyter.rc has 40 physical CPU-cores and one NVIDIA P100 GPU.

jupyter.rc

There is also jupyter.adroit which can be used if you already have an account on Adroit.

 

Do Not Run Jupyter on the Login Nodes

The login or head node of each cluster is a resource that is shared by many users. Running Jupyter on one of these nodes may adversely affect other users. Please use one of the approaches described on this page to carry out your work.

 

Running on Tigressdata

Tigressdata is standalone node specifically for visualization and data analysis including the use of Jupyter notebooks. It offers 40 physical CPU cores and a P100 GPU. Like jupyter.rc, Tigressdata mounts each of the /scratch/gpfs filesystems of Tiger and Della as well as /tigress. For the Tiger filesystem, for instance, use the path /tiger/scratch/gpfs/<YourNetID>. There is no queueing system on tigressdata. Use the htop command to monitor activity.

Base Conda Environment

If for some reason jupyter.rc does not fit your needs then you may consider using one of the procedures below to run Jupyter directly on tigressdata:

# from behind VPN if off-campus
$ ssh <YourNetID>@tigressdata.princeton.edu
$ module load anaconda3/2020.11
$ jupyter-notebook --no-browser --port=8889 --ip=127.0.0.1
# note the last line of the output which will be something like
http://127.0.0.1:8889/?token=61f8a2aa8ad5e469d14d6a1f59baac05a8d9577916bd7eb0
# leave the session running

Then in a new terminal on your laptop:

$ ssh -N -f -L localhost:8889:localhost:8889 <YourNetID>@tigressdata.princeton.edu

Lastly, open a web browser and copy and paste the URL from the previous output:

http://127.0.0.1:8889/?token=61f8a2aa8ad5e469d14d6a1f59baac05a8d9577916bd7eb0

Choose "New" then "Python 3" to launch a new notebook. Note that Jupyter may use a port that is different than the one you specified. This is why it is import to copy and paste the URL. See below for a discussion on ports. When you are done, terminate the ssh tunnel by running lsof -i tcp:8889 to get the PID and then kill -9 <PID> (e.g., kill -9 6010).

Custom Conda Environment

The procedue above will only be useful if you only need the base Conda environment which includes just less than three hundred packages. If you need custom packages then you should create a new Conda environment and include jupyter in addition to the other packages that you need. The necessary modifications are shown below:

$ ssh <YourNetID>@tigressdata.princeton.edu
$ module load anaconda3/2020.11
$ conda create --name myenv jupyter <package-2> <package-3>
$ conda activate myenv
$ jupyter-notebook --no-browser --port=8889 --ip=127.0.0.1

The packages in the base environment will not be available in your custom environment unless you explicitly list them (e.g., numpy, matplotlib, scipy).

Another Approach

Here is a second method where the web browser on tigressdata is used along with X11 forwarding (see requirements):

# from behind VPN
$ ssh -X <YourNetID>@tigressdata.princeton.edu
$ module load anaconda3/2020.11
$ cd /tiger/scratch/gpfs/<YourNetID>  # or another directory
$ jupyter notebook --ip=0.0.0.0

However, the first time this is done one should set the browser. After sshing and loading the anaconda3 module:

$ jupyter notebook --generate-config
$ vim /home/$USER/.jupyter/jupyter_notebook_config.py
# make line 99 equal to c.NotebookApp.browser = '/usr/bin/firefox'

For better performance consider connecting to tigressdata using TurboVNC.

 

Running on a Compute Node via salloc

Larger tasks can be run on one of the compute nodes by requesting an interactive session using salloc. Once a compute node has been allocated, one starts Jupyter and then connects to it.

The directions below are shown in this YouTube video for the specific case of running PyTorch on a TigerGPU node.

First, from the head node, request an interactive session on a compute node. The command below requests one CPU-core with 4 GB of memory for 1 hour:

$ ssh <YourNetID>@tiger.princeton.edu
$ salloc --nodes=1 --ntasks=1 --mem=4G --time=01:00:00

On TigerGPU, to request a GPU you would add --gres=gpu:1 to the command above.

Once the node has been allocated, run the hostname command to get the name of the node. For Tiger, the hostname of the compute node will be something like tiger-h26c2n22.

On that node, first unset the XDG_RUNTIME_DIR environment variable to avoid a permission issue, then launch either Jupyter lab or notebook:

$ export XDG_RUNTIME_DIR=""
$ module load anaconda3/2020.11
$ jupyter-notebook --no-browser --port=8889 --ip=0.0.0.0
# or
$ jupyter-lab --no-browser --port=8889 --ip=0.0.0.0
# note the last line of the output which will be something like
http://127.0.0.1:8889/?token=61f8a2aa8ad5e469d14d6a1f59baac05a8d9577916bd7eb0
# leave the session running

If you are looking to use a custom Conda environment then see "Custom Conda Environment" above.

Next, start a second terminal session on your local machine (e.g., laptop) and setup the tunnel as follows:

$ ssh -N -f -L 8889:tiger-h26c2n22:8889 <YourNetID>@tiger.princeton.edu

In the command above, be sure to replace tiger-h26c2n22 with the hostname of the node that salloc assigned to you. Note that we selected the Linux port 8889 to connect to the notebook. If you don't specify the port, it will default to port 8888 but sometimes this port can be already in use either on the remote machine or the local one (i.e., your laptop). If the port you selected is unavailable, you will get an error message, in which case you should just pick another one. It is best to keep it greater than 1024. Consider starting with 8888 and increment by 1 if it fails, e.g., try 8888, 8889, 8890 and so on. If you are running on a different port then substitute your port number for 8889.

Lastly, open a web browser and copy and paste the URL from the previous output:

http://127.0.0.1:8889/?token=61f8a2aa8ad5e469d14d6a1f59baac05a8d9577916bd7eb0

Choose "New" then "Python 3" to launch a new notebook. Note that Jupyter may use a port that is different than the one you specified. This is why it is import to copy and paste the URL.  When you are done, terminate the ssh tunnel on your local machine by running lsof -i tcp:8889 to get the PID and then kill -9 <PID> (e.g., kill -9 6010).

Aside on ssh

Looking at the man page for ssh, the relevant flags are:

-N  Do not execute a remote command. This is useful for just forwarding ports.

-f  Requests ssh to go to background just before command execution. This is useful if ssh is going to ask for passwords or passphrases, but the user wants it in the background.

-L  Specifies that the given port on the local (client) host is to be forwarded to the given host and port on the remote side

Aside on Open Ports

Jupyter will automatically find an open port if you happen to specify one that is occupied. If you wish to do the scanning yourself then run the command below:

$ netstat -antp | grep :88 | sort
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:8863          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8863          127.0.0.1:39636         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8873          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8874          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8888          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8888          127.0.0.1:59728         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8889          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8890          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8891          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8891          127.0.0.1:36984         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8891          127.0.0.1:38218         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8891          127.0.0.1:43658         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8892          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8893          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8893          127.0.0.1:57036         ESTABLISHED -

The output above is showing the ports that are in use (see Local Address column). This means the following ports are open: 8894, 8895, 8896, etc. One could also scan for ports in the range of 99xx instead of 88xx.

 

Avoiding Using a VPN from Off-Campus

One way to access the clusters from your laptop while off-campus is from behind a VPN such as GlobalProtect. However, there is a network performance penalty to such an approach. An alternative which avoids this penalty is to run the Jupyter notebook on tigressdata and use tigressgateway as a hop-through. This requires that you have an account on Tiger or Della.

On your laptop, begin by launching jupyter in the background on tigressdata after going through tigressgateway:

$ ssh <YourNetID>@tigressgateway.princeton.edu
$ ssh <YourNetID>@tigressdata.princeton.edu
$ module load anaconda3/2020.11
$ jupyter-notebook --no-browser --port=8889 --ip=0.0.0.0
# or
$ jupyter-lab --no-browser --port=8889 --ip=0.0.0.0
...
To access the notebook, open this file in a browser:
        file:///home/ceisgrub/.local/share/jupyter/runtime/nbserver-72516-open.html
    Or copy and paste one of these URLs:
        http://tigressdata2.princeton.edu:8889/?token=93d4eff65897ed763aea0550ae66fad30bec8513485cf830
     or http://127.0.0.1:8889/?token=93d4eff65897ed763aea0550ae66fad30bec8513485cf830
# leave the session running in your terminal

The last line of the output above will be needed below. Next, on your laptop, start a second terminal session and run the following command to connect to tigressgateway with port forwarding enabled:

$ ssh -N -f -L 8889:tigressdata:8889 <YourNetID>@tigressgateway.princeton.edu

Finally, open a web browser and point it at the URL given above:

http://127.0.0.1:8889/?token=93d4eff65897ed763aea0550ae66fad30bec8513485cf830

If the procedure fails then try again using another port number as discussed above.

Note that the /scratch/gpfs fileystems are mounted on tigressdata. Run the checkquota command to see how to reference them. For instance, for Tiger the path is /tiger/scratch/gpfs/<YourNetID>. This means you can use Jupyter on tigressdata to analyze data on the different gpfs fileystems via the web browser on your laptop.

 

Running on a Compute Node via sbatch

The second way of running Jupyter on the cluster is by submitting a job via sbatch that launches Jupyter on the compute node.

In order to do this we need a submission script like the following called jupyter.sh:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=00:05:00
#SBATCH --job-name=jupyter-notebook

# get tunneling info
XDG_RUNTIME_DIR=""
node=$(hostname -s)
user=$(whoami)
cluster="tigercpu"
port=8889

# print tunneling instructions jupyter-log
echo -e "
Command to create ssh tunnel:
ssh -N -f -L ${port}:${node}:${port} ${user}@${cluster}.princeton.edu

Use a Browser on your local machine to go to:
localhost:${port}  (prefix w/ https:// if using password)
"

# load modules or conda environments here
module load anaconda3/2020.11

# Run Jupyter
jupyter-notebook --no-browser --port=${port} --ip=${node}

This job launches Jupyter on the allocated compute node and we can access it through an ssh tunnel as we did in the previous section.

First, from the head node, we submit the job to the queue:

$ sbatch jupyter.sh

Once the job is running, a log file will be created that is called jupyter-notebook-<jobid>.log. The log file contains information on how to connect to Jupyter, and the necessary token.

In order to connect to Jupyter that is running on the compute node, we set up a tunnel on the local machine as follows:

$ ssh -N -f -L 8889:tiger-h26c2n22:8889 <YourNetID>@tigercpu.princeton.edu

where tiger-h26c2n22 is the name of the node that was allocated in this case.

In order to access Jupyter, navigate to http://localhost:8889/

In the directions on this page, the only packages that are available to the user are those made available by loading the anaconda3 module. If you have created your own Conda environment then you will need to activate it before running the “jupypter-lab” or “jupyter-notebook” command. Be sure that the “jupyter” package is installed into your environment (i.e., conda activate myenv; conda install jupyter).

 

FAQ

1. When trying to open a notebook on MyAdroit/MyDella, how do I resolve the error "File Load Error for mynotebook.ipynb"?

Try closing all your Jupyter notebooks and then remove this file: /home/<YourNetID>/.local/share/jupyter/nbsignatures.db

2. When using Job Composer on MyAdroit/MyDella, how to deal with the error message of "We're sorry, but something went wrong."?

Close all of your OnDemand sessions. Connect to the login node and run the following command:

$ rm -rf ~/ondemand/data/sys/myjobs/

Then in the OnDemand main menu, choose "Help" and then "Restart web server".

3. Why do I not see my Conda environments when I start a Jupyter notebook?

To make Conda environments work, OnDemand will attempt to install the ipykernel package into the environment. If it fails to do this because of conflicts then the environment will not be available. To see the error look at the most recent log file with "find $HOME/ondemand/data/sys/dashboard/batch_connect/sys -name launch_wrapper_env.log". You can try to install ipykernel into the environment on the command line to resolve the issue yourself.

 

Getting Help

If you encounter any difficulties while working with Jupyter on the HPC clusters, please send an email to cses@princeton.edu or attend a help session.