Jupyter on the HPC Clusters

OUTLINE

 

Running Jupyter via Your Web Browser

Research Computing provides multiple web portals for running Jupyter. You will need to use a VPN to connect from off-campus (GlobalProtect VPN is recommended). If you have an account on Adroit, Della or Stellar then browse to https://myadroit.princeton.eduhttps://mydella.princeton.edu or https://mystellar.princeton.edu. For an account on Adroit, complete this form.

To begin a session, click on "Interactive Apps" and then "Jupyter" (or choose "Jupyter on Adroit/Della Vis" for visualization or light interactive work with Internet access). You will need to choose the "Number of hours", "Number of cores" and "Memory allocated". Set "Number of cores" to 1 unless you are sure that your script has been explicitly parallelized. Click "Launch" and then when your session is ready click "Connect to Jupyter". Note that the more resources you request, the more you will have to wait for your session to become available. When your session starts, click on "New" in the upper right and choose a kernel such as "Python 3.8 [anaconda3/2020.7]" from the drop-down menu. Read about using custom Conda environment below.

There are limits on the number of cores, memory and time for OnDemand jobs. If for some reason you need to bypass these limits then try using the salloc approach described below.

OnDemand Jupyter

Internet is Not Available on Compute Nodes, Only on Visualization Nodes

Jupyter sessions run on the compute nodes which do not have Internet access. This means that you will not be able to download files, clone a repo from GitHub, install packages, etc. You will need to perform these operations on the login node before starting the session. To do this, in the main OnDemand menu, click on "Clusters" and then "<Name> Cluster Shell Access". This will present you with a black terminal on the login node where you can run commands which need Internet access. Any files that you download while on the login node will be available on the compute nodes in your OnDemand session.

Internet access is available when running Jupyter on a visualization node. For example, when making a session on MyDella, choose "Interactive Apps" then "Jupyter on Della-Vis2". There is no job scheduler on the visualization nodes. Be sure to use these nodes in a way that is to fair all users.

Custom Conda Environments on MyDella, MyStellar and MyAdroit

First, make sure that you do not have any OnDemand sessions running when you make the Conda environment. New environments will not be found by running sessions. Next, create a Conda environment on the login node (see our Python page for details) and be sure to install ipykernel and notebook. For example, for Adroit/MyAdroit:

$ ssh <YourNetID>@adroit.princeton.edu  # or Cluster Shell Access (see above)
$ module load anaconda3/2022.5
$ conda create --name tf-cpu ipykernel notebook tensorflow pandas matplotlib --channel conda-forge
$ exit

The ipykernel package should be installed since if it is not then OnDemand will try to install it which can lead to conflicts and sessions that fail to start. If your session fails to start (like in the image below) then see (3) in the FAQ at the bottom of this page for a solution.

OnDemand Jupyter Troubleshooting

After making the Conda environment on the command line, go to MyAdroit and launch a Jupyter notebook by entering the "Number of hours" and so on and then click on "Launch". When your session is ready, click on "Connect to Jupyter". On the next screen, choose "New" in the upper right and then tf-cpu in the drop-down menu. Your tf-cpu environment will be active when the notebook appears. If you are using a Python 3 notebook, to see the packages in your Conda environment, run this command in a cell (include the percent sign):

%conda list

Note that Jupyter notebooks via OnDemand run on the compute nodes where Internet access is disabled (see above, exception is sessions running on the visualization nodes). This means that you will not be able to install packages or download files. To install additional packages on Adroit, for example:

$ ssh <YourNetID>@adroit.princeton.edu  # or Cluster Shell Access
$ module load anaconda3/2022.5
$ conda activate <your-environment>
$ conda install <another-package-1> <another-package-2> --channel <original-channel>
$ conda deactivate
$ exit

After you install the additional packages go to MyAdroit and they will be available. The same procedure can be used for Della/MyDella. For some packages you will need to add the conda-forge channel or even perform the installation using pip as the last step. See Python on the HPC Clusters for additional information.

If your OnDemand session fails to start then see (3) in the FAQ at the bottom of this page.

Make sure you have enough disk space by running the checkquota command. An error message like "[Errno 122] Disk quota exceeded" is a sure sign that you are over quota.

Conda Environments that are Not Stored in /home/<YourNetID>/.conda

By default OnDemand will look in ~/.conda for your Conda environments. If you are storing them in another location then they will not be found. The solution is to create a symbolic link. For instance, if your Conda environments are stored in /scratch/network/$USER/CONDA then create a symbolic link like this:

$ cd ~
# make sure you do not have a ~/.conda directory before running the next line
$ ln -s /scratch/network/$USER/CONDA .conda

On Della or Stellar, replace "network" with "gpfs".

 

Using Widgets

Begin by creating an environment on the login node as described above:

$ conda create --name widg-env matplotlib jupyterlab ipywidgets ipympl --channel conda-forge

When filling out the form to create the OnDemand session in the field "Anaconda3 version used for starting up jupyter interface" choose the name of your environment, e.g., "Use your conda env widg-env". Learn more about ipywidgets and ipympl.

 

Requesting a GPU on MyDella

MyDella provides three GPU options: (1) a MIG GPU with 10 GB of memory, (2) an A100 GPU with 40 GB of memory and (3) an A100 with 80 GB of memory. A MIG GPU is essentially a small A100 GPU with about 1/7th the performance and memory of an A100. MIG GPUs are ideal for interactive work such as Jupyter where the GPU is not always being used. The queue time for a MIG GPU is on average much less than that for an A100. MIG GPUs can be used when (1) only a single CPU-core is needed, (2) the required CPU memory is less than 32 GB and (3) the required GPU memory is less than 10 GB. Please use a MIG GPU whenever possible.

MIG GPU

To request a MIG GPU choose "mig" as the partition when creating the Jupyter session as below:

MIG OnDemand

A100 GPU

In general, when using Jupyter you should use a MIG GPU as explained above. If you need an A100 GPU then follow these directions: From the OnDemand main menu, choose "Interactive Apps" then "Jupyter". You will then need to choose the "Number of hours", "Number of cores" and so on. Leave "Node type" as "any". The last field on this page is "Extra slurm options" where you should enter the following:

--gres=gpu:1

If you need one of the 80GB GPUs then use "--gres=gpu:1 --constraint=gpu80". Note that if all the GPUs are in use then you will have to wait. To check what is available, from the OnDemand main menu, click on "Clusters" and then "Della Cluster Shell Access". From the black terminal screen run the command "shownodes -p gpu". See the "FREE/TOTAL GPUs" column. Run the command below to see when queued jobs are expected to start:

$ squeue -u $USER --start

Environment Modules

Most users should create and use a Conda environment following the directions above. If you need to work with specific environment modules then choose "custom" under "Anaconda3 version used for starting up jupyter interface". This will present two new fields. Specify the needed modules in the "Modules to load instead of default anaconda3 modules" field. For instance, one could specify:

anaconda3/2022.5 cudatoolkit/11.7

 

Requesting a GPU on MyStellar

From the OnDemand main menu, choose "Interactive Apps" then "Jupyter". You will then need to choose the "Number of hours", "Number of cores" and so on. The last field on this page is "Extra slurm options". To request a A100 GPU enter this:

--gres=gpu:1

 

Requesting a GPU on MyAdroit

From the OnDemand main menu, choose "Interactive Apps" then "Jupyter". You will then need to choose the "Number of hours", "Number of cores" and so on. The last field on this page is "Extra slurm options" where you should enter the following:

--gres=gpu:1

Note that if all the GPUs are in use then you will have to wait. To check what is available, from the OnDemand main menu, click on "Clusters" and then "Adroit Cluster Shell Access". From the black terminal screen run the command "shownodes -p gpu". See the "FREE/TOTAL GPUs" column. For details on choosing specific GPUs see the Adroit page.

 

jupyter.rc

For those with an account on Tiger or Della another possibility is https://jupyter.rc.princeton.edu which is a standalone node designed for running interactive Jupyter notebooks. Note that you will need to use a VPN to connect from off-campus. Unfortunately, custom Conda environments are not supported on this machine. Additionally, users need to choose one of four job profiles and each contains a fairly old version of anaconda3. Unike MyAdroit and MyDella, jupyter.rc mounts each of the /scratch/gpfs filesystems of Tiger and Della as well as /tigress and /projects. This makes it easy to analyze data that has been generated on these clusters. For the Tiger filesystem, for instance, use the path /tiger/scratch/gpfs/<YourNetID>. Note that jupyter.rc has 40 physical CPU-cores and one NVIDIA P100 GPU.

jupyter.rc

There is also jupyter.adroit which can be used if you already have an account on Adroit.

 

Do Not Run Jupyter on the Login Nodes

The login or head node of each cluster is a resource that is shared by many users. Running Jupyter on one of these nodes may adversely affect other users. Please use one of the approaches described on this page to carry out your work.

 

Running on Tigressdata

Tigressdata is standalone node specifically for visualization and data analysis including the use of Jupyter notebooks. It offers 40 physical CPU cores and a P100 GPU. Like jupyter.rc, Tigressdata mounts each of the /scratch/gpfs filesystems of Tiger and Della as well as /tigress. For the Tiger filesystem, for instance, use the path /tiger/scratch/gpfs/<YourNetID>. There is no queueing system on tigressdata. Use the htop command to monitor activity.

Base Conda Environment

If for some reason jupyter.rc does not fit your needs then you may consider using one of the procedures below to run Jupyter directly on tigressdata:

# from behind VPN if off-campus
$ ssh <YourNetID>@tigressdata.princeton.edu
$ module load anaconda3/2020.11
$ jupyter-notebook --no-browser --port=8889 --ip=127.0.0.1
# note the last line of the output which will be something like
http://127.0.0.1:8889/?token=61f8a2aa8ad5e469d14d6a1f59baac05a8d9577916bd7eb0
# leave the session running

Then in a new terminal on your laptop:

$ ssh -N -f -L localhost:8889:localhost:8889 <YourNetID>@tigressdata.princeton.edu

Lastly, open a web browser and copy and paste the URL from the previous output:

http://127.0.0.1:8889/?token=61f8a2aa8ad5e469d14d6a1f59baac05a8d9577916bd7eb0

Choose "New" then "Python 3" to launch a new notebook. Note that Jupyter may use a port that is different than the one you specified. This is why it is import to copy and paste the URL. See below for a discussion on ports. When you are done, terminate the ssh tunnel by running lsof -i tcp:8889 to get the PID and then kill -9 <PID> (e.g., kill -9 6010).

Using Custom Conda Environments in Tigressdata

The procedue above will only be useful if you only need the base Conda environment which includes just less than three hundred packages. If you need custom packages then you should create a new Conda environment and include jupyter in addition to the other packages that you need. The necessary modifications are shown below:

$ ssh <YourNetID>@tigressdata.princeton.edu
$ module load anaconda3/2020.11
$ conda create --name myenv jupyter <package-2> <package-3>
$ conda activate myenv
$ jupyter-notebook --no-browser --port=8889 --ip=127.0.0.1

The packages in the base environment will not be available in your custom environment unless you explicitly list them (e.g., numpy, matplotlib, scipy).

Another Approach

Here is a second method where the web browser on tigressdata is used along with X11 forwarding (see requirements):

# from behind VPN
$ ssh -X <YourNetID>@tigressdata.princeton.edu
$ module load anaconda3/2020.11
$ cd /tiger/scratch/gpfs/<YourNetID>  # or another directory
$ jupyter notebook --ip=0.0.0.0

However, the first time this is done one should set the browser. After sshing and loading the anaconda3 module:

$ jupyter notebook --generate-config
$ vim /home/$USER/.jupyter/jupyter_notebook_config.py
# make line 99 equal to c.NotebookApp.browser = '/usr/bin/firefox'

For better performance consider connecting to tigressdata using TurboVNC.

 

Running on a Compute Node via salloc

Larger tasks can be run on one of the compute nodes by requesting an interactive session using salloc. Once a compute node has been allocated, one starts Jupyter and then connects to it.

The directions below are shown in this YouTube video for the specific case of running PyTorch on a TigerGPU node. The procedure can be used on all of the clusters.

First, from the head node, request an interactive session on a compute node. The command below requests 1 CPU-core with 4 GB of memory for 1 hour:

$ ssh <YourNetID>@tiger.princeton.edu
$ salloc --nodes=1 --ntasks=1 --mem=4G --time=01:00:00

To request a GPU you would add --gres=gpu:1 to the command above. See the Slurm webpage to learn more about nodes and ntasks.

Once the node has been allocated, run the hostname command to get the name of the node. For Tiger, the hostname of the compute node will be something like tiger-h26c2n22.

On that node, first unset the XDG_RUNTIME_DIR environment variable to avoid a permission issue, then launch either Jupyter lab or Jupyter notebook:

$ export XDG_RUNTIME_DIR=""
$ module load anaconda3/2022.5
$ jupyter-notebook --no-browser --port=8889 --ip=0.0.0.0
# or
$ jupyter-lab --no-browser --port=8889 --ip=0.0.0.0
# note the last line of the output which will be something like
http://127.0.0.1:8889/?token=61f8a2aa8ad5e469d14d6a1f59baac05a8d9577916bd7eb0
# leave the session running

If you are looking to use a custom Conda environment then see "Custom Conda Environment" above.

Next, start a second terminal session on your local machine (e.g., laptop) and setup the tunnel as follows:

$ ssh -N -f -L 8889:tiger-h26c2n22:8889 <YourNetID>@tiger.princeton.edu

In the command above, be sure to replace tiger-h26c2n22 with the hostname of the node that salloc assigned to you. Note that we selected the Linux port 8889 to connect to the notebook. If you don't specify the port, it will default to port 8888 but sometimes this port can be already in use either on the remote machine or the local one (i.e., your laptop). If the port you selected is unavailable, you will get an error message, in which case you should just pick another one. It is best to keep it greater than 1024. Consider starting with 8888 and increment by 1 if it fails, e.g., try 8888, 8889, 8890 and so on. If you are running on a different port then substitute your port number for 8889.

Lastly, open a web browser and copy and paste the URL from the previous output:

http://127.0.0.1:8889/?token=61f8a2aa8ad5e469d14d6a1f59baac05a8d9577916bd7eb0

Choose "New" then "Python 3" to launch a new notebook. Note that Jupyter may use a port that is different than the one you specified. This is why it is import to copy and paste the URL.  When you are done, terminate the ssh tunnel on your local machine by running lsof -i tcp:8889 to get the PID and then kill -9 <PID> (e.g., kill -9 6010).

Aside on ssh

Looking at the man page for ssh, the relevant flags are:

-N  Do not execute a remote command. This is useful for just forwarding ports.

-f  Requests ssh to go to background just before command execution. This is useful if ssh is
going to ask for passwords or passphrases, but the user wants it in the background.

-L  Specifies that the given port on the local (client) host is to be forwarded to the given
host and port on the remote side

Aside on Open Ports

Jupyter will automatically find an open port if you happen to specify one that is occupied. If you wish to do the scanning yourself then run the command below:

$ netstat -antp | grep :88 | sort
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:8863          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8863          127.0.0.1:39636         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8873          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8874          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8888          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8888          127.0.0.1:59728         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8889          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8890          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8891          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8891          127.0.0.1:36984         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8891          127.0.0.1:38218         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8891          127.0.0.1:43658         ESTABLISHED -                   
tcp        0      0 127.0.0.1:8892          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8893          0.0.0.0:*               LISTEN      -                   
tcp        0      0 127.0.0.1:8893          127.0.0.1:57036         ESTABLISHED -

The output above is showing the ports that are in use (see Local Address column). This means the following ports are open: 8894, 8895, 8896, etc. One could also scan for ports in the range of 99xx instead of 88xx. If you are interested in a range of port numbers not beginning with "88" then modify the grep command accordingly.

 

Avoiding Using a VPN from Off-Campus

One way to access the clusters from your laptop while off-campus is from behind a VPN such as GlobalProtect. However, there is a network performance penalty to such an approach. An alternative which avoids this penalty is to run the Jupyter notebook on tigressdata and use tigressgateway as a hop-through. This requires that you have an account on Tiger or Della.

On your laptop, begin by launching jupyter in the background on tigressdata after going through tigressgateway:

$ ssh <YourNetID>@tigressgateway.princeton.edu
$ ssh <YourNetID>@tigressdata.princeton.edu
$ module load anaconda3/2020.11
$ jupyter-notebook --no-browser --port=8889 --ip=0.0.0.0
# or
$ jupyter-lab --no-browser --port=8889 --ip=0.0.0.0
...
To access the notebook, open this file in a browser:
        file:///home/ceisgrub/.local/share/jupyter/runtime/nbserver-72516-open.html
    Or copy and paste one of these URLs:
        http://tigressdata2.princeton.edu:8889/?token=93d4eff65897ed763aea0550ae66fad30bec8513485cf830
     or http://127.0.0.1:8889/?token=93d4eff65897ed763aea0550ae66fad30bec8513485cf830
# leave the session running in your terminal

The last line of the output above will be needed below. Next, on your laptop, start a second terminal session and run the following command to connect to tigressgateway with port forwarding enabled:

$ ssh -N -f -L 8889:tigressdata:8889 <YourNetID>@tigressgateway.princeton.edu

Finally, open a web browser and point it at the URL given above:

http://127.0.0.1:8889/?token=93d4eff65897ed763aea0550ae66fad30bec8513485cf830

If the procedure fails then try again using another port number as discussed above.

Note that the /scratch/gpfs fileystems are mounted on tigressdata. Run the checkquota command to see how to reference them. For instance, for Tiger the path is /tiger/scratch/gpfs/<YourNetID>. This means you can use Jupyter on tigressdata to analyze data on the different gpfs fileystems via the web browser on your laptop.

 

Running on a Compute Node via sbatch

The second way of running Jupyter on the cluster is by submitting a job via sbatch that launches Jupyter on the compute node.

In order to do this we need a submission script like the following called jupyter.sh:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=00:05:00
#SBATCH --job-name=jupyter-notebook

# get tunneling info
XDG_RUNTIME_DIR=""
node=$(hostname -s)
user=$(whoami)
cluster="tigercpu"
port=8889

# print tunneling instructions jupyter-log
echo -e "
Command to create ssh tunnel:
ssh -N -f -L ${port}:${node}:${port} ${user}@${cluster}.princeton.edu

Use a Browser on your local machine to go to:
localhost:${port}  (prefix w/ https:// if using password)
"

# load modules or conda environments here
module load anaconda3/2020.11

# Run Jupyter
jupyter-notebook --no-browser --port=${port} --ip=${node}

This job launches Jupyter on the allocated compute node and we can access it through an ssh tunnel as we did in the previous section.

First, from the head node, we submit the job to the queue:

$ sbatch jupyter.sh

Once the job is running, a log file will be created that is called jupyter-notebook-<jobid>.log. The log file contains information on how to connect to Jupyter, and the necessary token.

In order to connect to Jupyter that is running on the compute node, we set up a tunnel on the local machine as follows:

$ ssh -N -f -L 8889:tiger-h26c2n22:8889 <YourNetID>@tigercpu.princeton.edu

where tiger-h26c2n22 is the name of the node that was allocated in this case.

In order to access Jupyter, navigate to http://localhost:8889/

In the directions on this page, the only packages that are available to the user are those made available by loading the anaconda3 module. If you have created your own Conda environment then you will need to activate it before running the “jupypter-lab” or “jupyter-notebook” command. Be sure that the “jupyter” package is installed into your environment (i.e., conda activate myenv; conda install jupyter).

 

FAQ and Troubleshooting

1. When trying to open a notebook on MyAdroit/MyDella, how do I resolve the error "File Load Error for mynotebook.ipynb"?

Try closing all your Jupyter notebooks and then remove this file: /home/<YourNetID>/.local/share/jupyter/nbsignatures.db

2. When using Job Composer on MyAdroit/MyDella, how to deal with the error message of "We're sorry, but something went wrong."?

Close all of your OnDemand sessions. Connect to the login node and run the following command:

$ rm -rf ~/ondemand/data/sys/myjobs/

Then in the OnDemand main menu, choose "Help" and then "Restart web server".

3. Why does my session hang with the message "Your session is currently starting... Please be patient as this process can take a few minutes."?

OnDemand will attempt to install the ipykernel package into each of your Conda environments. If it fails (because of conflicts) then that environment will not be available and the session, in fact, may never launch. One solution is to install ipykernel into each of the problematic environments. To do this, quit all of your OnDemand sessions and then go to the Linux command line. If the "myenv" environment is the problem then try:

$ module load anaconda3/2022.5
$ conda activate myenv
$ conda install ipykernel
$ exit

Repeat the procedure above for each environment that is failing. Then try again to create and launch a Jupyter session. In some cases the environment causing the problem is one that you do not want to use. In this case you may consider removing that environment.

If the above solution does not solve the problem then try looking for errors in the most recent log file. To do this, first, navigate to the output directory:

$ cd ~/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output

Next, list the directories with the newest at the bottom of the output:

$ ls -ltrh
drwxr-xr-x. 4 aturing math 4.0K Jan 16 19:00 c3f42927-2caa-4082-93ee-1eca54076e67
drwxr-xr-x. 4 aturing math 4.0K Jan 19 19:10 c1cae215-07c7-4f01-9d86-1922fdea75d5
drwxr-xr-x. 4 aturing math 4.0K Jan 20 19:52 ff69c323-5399-4a56-a890-8900c43ce5a4

Lastly, run the cat command on the output.log file that is in the directory listed at the bottom of the output above:

$ cat ff69c323-5399-4a56-a890-8900c43ce5a4/output.log

Look for error messages in output.log to understand why the Jupyter session is not starting.

4. How do I solve this error: "Error: HTTP 500: Internal Server Error (Spawner failed to start [status=3]. The logs for aturing may contain details.)"?

You might be over quota. Please see the checkquota page. If that is not the issue then try selecting "Help" and then "Restart Web Server". Then try to create a session.

5. I am experiencing file quota issues. I deleted some files in Jupyter but the files have not be been deleted and instead were moved to ~/.local/share/Trash. How do I remove them?

You can remove your Trash directory by running the following command:

$ rm -rf ~/.local/share/Trash

6. Why do I not see my Conda environments when I start a Jupyter notebook in OnDemand?

See (3) above as it could be that OnDemand tried to install ipykernel into one or more Conda environments and it failed.

By default, OnDemand looks in /home/<YourNetID>/.conda for your environments. If you are storing them elsewhere such as /scratch/gpfs/<YourNetID>/CONDA then you will need to use a symbolic link. Here is an example of making such a symbolic link on the command line:

$ cd ~
# make sure you do not have a ~/.conda directory before running the next line
$ ln -s /scratch/gpfs/<YourNetID>/CONDA .conda

On Adroit, replace "gpfs" with "network" in the command above. The symbolic link acts as a redirect.

 

Getting Help

If you encounter any difficulties while working with Jupyter on the HPC clusters, please send an email to [email protected] or attend a help session.