Connecting to the Clusters via SSH

A Few Caveats

These instructions assume you:

  1. Are Princeton University faculty, student, or staff, or you have an RCU account
     
  2. Are connecting...
    - (if on-campus) from the campus wireless eduroam network or a wired campus connection
    - (if off-campus) with GlobalProtect VPN. You can install the GlobalProtect VPN on your laptop. Make sure you connect via the GlobalProtect app on your laptop before ssh-ing to Adroit or another cluster.
    - To check your current connection, you can visit https://myip.rc.princeton.edu/ for detailed information and suggestions.
     
  3. Are connecting with Duo Authentication
    - All of the clusters from both on campus and off campus now require two-factor authentication via Duo. If you need help getting this set up, contacting the OIT Support and Operations Center will be your best course of action. You can also see OIT's resources for using Duo.
    - Upon connecting, you can request a push to a cell phone application, a text with a passcode, or you can enter a generated pass code with a soft key created by the Duo application on your cell phone.
    - If you use a system that respects a standard '~/.ssh/config' file, you can use a multiplexing solution.
     
  4. Have an account on the system you're looking to connect to. (Note: You will need to login using your university credentials).
    To see how to get an account, visit the page for your specific system within the Systems submenu.

SSH

In order to connect to the university computing clusters, you will need an SSH (secure shell) client, a piece of software for establishing secure connections to remote machines. 

On MacOS and Linux, the default Terminal application has such a client built-in. No download is necessary!

On Windows 10 machines, there is also an SSH-enabled client. (If for some reason you don't have SSH enabled under Windows 10, follow this guide to enable it).

On Windows 8 machines, you'll need a client. One option is PuTTY and another is Mobaxterm.

To connect to a cluster via SSH on Linux, macOS, or Windows 10>

(Connecting on Windows 8 is discussed separately below.)

  1. Access a command-line on your laptop
    • Linux: open a Terminal window (usually by pressing Ctrl+Alt+t --- i.e. press and hold Ctrl, and without releasing it, press and hold Alt, and then without releasing either of those two keys, type 't')
    • macOS: open a Terminal window (by launching the Terminal app located in /Applications/Utilities)
    • Windows 10: Windows 10 has a few different ways to access a command-line interface
      • PowerShell or Command Prompt --- these are a couple of Window-native (DOS-like) command-line environments. To access them, press Win+x (i.e., while holding down the Windows key, type 'x'). This opens the so-called "Power Users" menu in Windows. Select either "Command Prompt" or "PowerShell" (usually you'll see one or the other, depending on which Windows updates you've installed). Note that the Command Prompt and the PowerShell are not equivalent command-line environments in general, but for the purposes of using SSH, they work the same and either will do.
      • Windows Subsystem for Linux (WSL) --- WSL is an optional feature you can enable that furnishes a genuine Linux command-line within Windows. If you have WSL enabled, then you should have an SSH client on the Linux side by default, just as you would in a regular Linux operating system. 
  2. SSH into the cluster
    The syntax for using ssh is the same in all of the above scenarios.  Remember to make sure you're on a Princeton VPN, and then on the command line you accessed in the previous step, type

    ssh <YourNetID>@<ClusterName>.princeton.edu

    So for instance, if your NetID is abc1, and you'd like to connect to the Adroit cluster, you would type:

    ssh [email protected]

    If this is your first time connecting to this cluster from whatever computer you're on, you will then see a comment about a fingerprint along with the question Are you sure you want to continue connecting (yes/no)?Answer 'yes' and hit Enter.

    You will now be prompted for your usual Princeton password.  Enter it.

    *NOTE*: there are no asterisks or dots to indicate how many characters you've typed, so if you think you've made a typo, hit Backspace many times and enter the password from scratch

    *NOTE*: If you've previously connected to a cluster and set up SSH keys, you will be connected without being prompted for a password first.

    Depending on whether you're on a VPN and how it's configured, you may now be prompted to enter a Duo code (if you are, do so).

    That's it -- you should now be connected to a cluster and see its Linux command-line prompt (e.g. [adroit4:~ <yourNetID>]$) instead of the one for your local computer.

  3. Ending the SSH connection to the cluster
    To close the SSH connection, simply type exit in the command line and press Enter. This should close the connection, and your local computer's command-line prompt should reappear.

A Note for Windows Users (Corrupted MAC on input)

If you encounter the following error:

$ ssh <YourNetID>@della.princeton.edu
Corrupted MAC on input.
ssh_dispatch_run_fatal: Connection to 128.112.172.234 port 22: message
authentication code incorrect

Then the solution is:

$ ssh -m hmac-sha2-512 <YourNetID>@della.princeton.edu

For VS Code SSH extension users, you will need to create an ssh config file on your local computer (%programdata%\ssh\sshd_config), with a host entry for Della that specifies a new message authentication code (the “MACs hmac-sha2-512” is the important bit here):

Host della
   HostName della.princeton.edu
   MACs hmac-sha2-512

To connect to a cluster via SSH on Windows 8

Windows 8 does not have a built-in SSH client, nor does it have a WSL that offers native access to a Linux command line. So if you run Windows 8 and want to make an SSH connection from within Windows (as opposed to, say, by running Linux inside VirtualBox and connecting to Adroit from within that virtual Linux session), then you need to install a separate SSH client.

We recommend either PuTTY or MobaXTerm.  These lightweight clients (MobaXTerm has more features) have a graphical interface to initiate SSH connections.

This video shows briefly how to connect to a remote server using Putty (starting at timestamp 0:54).  In the field for "Host Name", enter adroit.princeton.edu (leave the port number as 22). When you connect and it prompts you "login as: ", enter your NetID and then your password (again, you may be asked to Duo authenticate after entering your password). You should then be logged into Adroit and see its Linux command-line prompt. For more detailed information about Putty, consult this guide.

MobaXTerm should work fairly similarly.

If you have trouble connecting then see this page.

Example of Connecting via SSH Using Terminal

Once you launch an instance of Terminal, you'll be at a command prompt on your local machine (i.e., on your computer) that looks something like this:

benjaminhicks ~/hpc_beginning_workshop $

The '$' is an indication that you're ready to enter a command.

To connect to a cluster, the general address looks like this, where you replace the <>'s with the needed content:

ssh <NetID>@<hostname>.princeton.edu

To connect to Adroit, as a user with the NetID bhicks, I'd type something like this

benjaminhicks ~/hpc_beginning_workshop $ ssh [email protected]

and after hitting Enter, I'd see something like this

nat-oitwireless-inside-vapornet100-c-14666:hpc_beginning_workshop bhicks$ ssh [email protected]
Warning: the ECDSA host key for 'adroit.princeton.edu' differs from the key for the IP address '128.112.128.32'
Offending key for IP in /Users/bhicks/.ssh/known_hosts:88
Matching host key in /Users/bhicks/.ssh/known_hosts:115
Are you sure you want to continue connecting (yes/no)? yes
Password:
Duo two-factor login for bhicks
 
Enter a passcode or select one of the following options:
 
 1. Duo Push to XXX-XXX-3224
 2. Phone call to XXX-XXX-3224
 3. Phone call to XXX-XXX-8335
 4. SMS passcodes to XXX-XXX-3224 (next code starts with: 1)
 
Passcode or option (1-4): 493203
Success. Logging you in...
Last login: Wed Oct 10 09:12:28 2018 from nat-oitwireless-outside-vapornet3-l-14.princeton.edu
[adroit4:~ bhicks]$

I'm now remotely connected to adroit4, which is the head node of the cluster!

The shell I used in both cases is one called Bash. It's a particular command line interface that is common across Unix-alike machines.

SSH Keys: ssh without typing passwords

Typing passwords every time you want to connect to a machine or, more annoyingly, every time you want to copy a file to/from a remote machine gets annoying quickly. One solution is to enable passwordless login/remote operations by generating a public/private pair of ssh keys and using them to negotiate the connection. The procedure is explained in this guide.

Staying connected (tmux)

If your SSH connection is suddenly broken then the command you're running terminates.

One solution to this problem is tmux. It comes installed on all university clusters, and it lets you start a shell session that, rather than being remote via SSH, lives on the server.

A simple use case would be:

$ ssh <YourNetID>@adroit-vis.princeton.edu
$ tmux
$ wget https://www.bigdata.org/dataset.tar.gz
# ssh connection suddenly breaks!
# no problem just reconnect and attach
$ ssh <YourNetID>@adroit-vis.princeton.edu
$ tmux attach

To detach from your tmux session press ctrl-b, then d. To close a tmux session, run the "exit" command or press 'ctrl+d.' A tmux session will run on the remote server until the server is rebooted or you close them. Anything you run while attached to the tmux session runs in that session, and therefore it is safe from a disconnect.

Nobel has two hosts, "compton" and "davisson", and the session will live on one or the other. You can find the host by looking in the lower right corner of your tmux session window. To find your session again after a disconnect, you'll need to login directly to that host rather than just nobel, i.e. "ssh compton.princeton.edu." Otherwise you'll do "tmux attach" and not find your session.

tmux is a powerful and complex tool. In addition to the simple guide linked above, you might explore the following:

Learn More About a Cluster by Running Commands

Once in a cluster, type each command below and examine the output:

 hostname                  # get the name of the machine you are on
 whoami                    # get username of the account
 date                      # get the current date and time
 pwd                       # print working directory
 cat /etc/os-release       # info about operating system
 lscpu                     # info about the CPUs on head node
 shownodes                    # info about the compute nodes (7 nodes for myadroit)
 squeue                    # which jobs are running or waiting to run
 qos                       # quality of service (job partitions and limits)
 slurmtop                  # shows a map of cluster usage
 who                       # list users on the head node
 checkquota                # view your quota and request more space

Here is example output from the commands above on Adroit:

$ hostname
adroit5
$ whoami
ceisgrub
$ date
Tue Feb 21 14:37:44 EST 2023
$ pwd
/home/ceisgrub
$ cat /etc/os-release
NAME="Springdale Open Enterprise Linux"
VERSION="8.7 (Modena)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.7"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Springdale Open Enterprise Linux 8.7 (Modena)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:springdale:enterprise_linux:8.7:GA"
HOME_URL="https://springdale.princeton.edu/"
BUG_REPORT_URL="https://springdale.princeton.edu/bugzilla"
REDHAT_BUGZILLA_PRODUCT="Springdale Open Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.7
REDHAT_SUPPORT_PRODUCT="Springdale Open Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.7"
$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  1
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
Stepping:            7
CPU MHz:             3900.000
CPU max MHz:         3900.0000
CPU min MHz:         1200.0000
BogoMIPS:            5800.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            22528K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs
bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx
rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku
ospke avx512_vnni md_clear flush_l1d arch_capabilities
$ shownodes
NODELIST      PART   STATE        FREE/TOTAL CPUs  CPU_LOAD  FREE/TOTAL MEMORY  FREE/TOTAL GPUs       FEATURES
adroit-08     class  idle                   32/32      0.00    379908/384000Mb                   skylake,intel
adroit-09     class  idle                   32/32      0.91    382191/384000Mb                   skylake,intel
adroit-10     class  idle                   32/32      0.00    378779/384000Mb                   skylake,intel
adroit-11     class  mixed                  25/32      0.01    333867/384000Mb                   skylake,intel
adroit-12     class  idle                   32/32      0.00    382422/384000Mb                   skylake,intel
adroit-13     class  mixed                  20/32     12.02    254458/384000Mb                   skylake,intel
adroit-14     class  allocated               0/32     32.23    344374/384000Mb                   skylake,intel
adroit-15     class  idle                   32/32      0.00    362568/384000Mb                   skylake,intel
adroit-16     class  idle                   32/32      0.00    355045/384000Mb                   skylake,intel
adroit-h11g1  gpu    mixed                  38/40      0.26    671544/770000Mb   2/4 tesla_v100     v100,intel
adroit-h11g2  gpu    mixed                  40/48      1.07   694282/1000000Mb  3/4 nvidia_a100     a100,intel
adroit-h11g3  gpu    mixed                  52/56      0.15    640082/760000Mb   3/4 tesla_v100     v100,intel
adroit-h11n1  class  idle                 128/128      0.00    250613/256000Mb                        amd,rome
adroit-h11n2  all    allocated               0/64     56.01    167196/512000Mb                       intel,ice
adroit-h11n3  all    mixed                   7/64     62.31    197670/512000Mb                       intel,ice
adroit-h11n4  all    allocated               0/64     28.02    139797/512000Mb                       intel,ice
adroit-h11n5  all    mixed                  16/64      3.10    303855/512000Mb                       intel,ice
adroit-h11n6  all    mixed                   4/64     35.97    270328/512000Mb                       intel,ice
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1712982       all gauss.cm   dt0998 PD       0:00      1 (QOSGrpCpuLimit)
           1711873       all syn_ctrl    slala PD       0:00      1 (QOSGrpCpuLimit)
           1711872       all syn_ctrl    slala PD       0:00      1 (QOSGrpCpuLimit)
           1711866       all syn_ctrl    slala PD       0:00      1 (QOSGrpCpuLimit)
           1711865       all syn_ctrl    slala PD       0:00      1 (QOSGrpCpuLimit)
           1711853       all       cv    slala PD       0:00      1 (QOSMaxCpuPerUserLimit)
           1711874       all       cv    slala PD       0:00      1 (Dependency)
           1711867       all       cv    slala PD       0:00      1 (Dependency)
           1711864       all       cv    slala PD       0:00      1 (DependencyNeverSatisfied)
           1712972       all   Mocha9   dnpham PD       0:00      1 (QOSGrpCpuLimit)
           1712979       all DTBP-sca jdeobald  R      43:55      1 adroit-h11n4
           1712949       all       PO xuanhong  R    2:43:12      1 adroit-h11n6
           1712948       all       PO xuanhong  R    2:46:24      1 adroit-h11n6
           1712771       all test_job   sk5339  R   17:55:45      1 adroit-h11n3
           1712692       all     Fe40 barsukov  R 1-00:41:16      1 adroit-14
           1712922       all sys/dash   nhazra  R    3:45:15      1 adroit-h11n6
           1712705       all 16Ti_non   bw1755  R   23:51:18      1 adroit-h11n4
           1712899       all      TS3 barsukov  R    2:23:44      1 adroit-h11n2
           1712984       all gauss.cm   dt0998  R      38:18      1 adroit-h11n5
           1712983       all gauss.cm   dt0998  R      38:53      1 adroit-h11n5
           1712999       all poisson_   sf5201  R       8:05      1 adroit-h11n5
           1712993       all sys/dash     jlca  R      13:24      1 adroit-h11n4
           1712894       all sys/dash  dpmoore  R    4:36:59      1 adroit-h11n6
           1711863       all syn_ctrl    slala  R    3:47:14      1 adroit-h11n3
           1711862       all syn_ctrl    slala  R    5:10:43      1 adroit-h11n6
           1712913       all sys/dash   ec7636  R    3:58:59      1 adroit-h11n6
           1712962       all sys/dash   ec7636  R    1:29:19      1 adroit-h11n6
           1712879       all sys/dash   gc0394  R    5:20:59      1 adroit-h11n2
           1712903       all sys/dash kaneelil  R    4:16:55      1 adroit-h11n6
           1712971       all visc_coa kaneelil  R      59:27      1 adroit-h11n6
           1711419       all sys/dash     hajc  R 4-13:07:28      1 adroit-h11n5
           1712213       all   Mocha6   dnpham  R 1-06:15:38      1 adroit-h11n2
           1711707       all sys/dash    hyork  R 3-21:44:32      1 adroit-h11n6
           1712706       all   Mocha7   dnpham  R   21:39:50      1 adroit-h11n2
           1712624       all   Mocha8   dnpham  R 1-03:46:59      1 adroit-13
           1712967       all   Mocha4   dnpham  R    1:05:46      1 adroit-h11n4
           1712976     class sys/dash   gc6782  R      46:40      1 adroit-11
           1712946     class sys/dash   mo9718  R    2:55:32      1 adroit-11
           1712989     class sys/dash     dm46  R      20:20      1 adroit-11
           1712896     class sys/dash     law2  R    4:35:10      1 adroit-11
           1711985     class sys/dash  vikashm  R 2-22:00:10      1 adroit-11
           1712901     class sys/dash   jm4437  R    4:24:53      1 adroit-11
           1712987       gpu sys/dash   awtang  R      34:47      1 adroit-h11g3
           1712953       gpu modular_ tinghanf  R    1:46:25      1 adroit-h11g2
           1712407       gpu sys/dash  xuchenz  R 1-17:58:11      1 adroit-h11g1
$ who
fcastro  pts/0        2023-02-21 13:47 (172.20.217.236)
mm5986   pts/8        2023-02-13 14:54 (128.112.36.72)
rsouth   pts/9        2023-02-21 13:24 (10.9.87.6)
ys5910   pts/30       2023-02-21 14:28 (172.21.2.7)
tinghanf pts/31       2023-02-21 10:07 (128.112.48.121)
xuchenz  pts/34       2023-02-21 13:48 (172.20.217.107)
zs0806   pts/37       2023-02-21 10:03 (172.21.2.7)
dnpham   pts/39       2023-02-21 10:14 (10.9.115.115)
dnpham   pts/40       2023-02-21 10:15 (10.9.115.115)
dpmoore  pts/44       2023-02-21 13:55 (172.20.216.19)
sf5201   pts/45       2023-02-21 13:58 (172.20.216.192)
gc6782   pts/47       2023-02-21 10:39 (10.8.20.197)
ak6174   pts/48       2023-02-21 11:04 (10.9.92.40)
cw1074   pts/52       2023-02-21 14:07 (172.20.210.60)
root     pts/59       2023-02-21 14:21 (172.21.2.12)
jdeobald pts/49       2023-02-21 10:41 (10.8.39.167)
cw1074   pts/53       2023-02-21 14:10 (172.20.210.60)
rt6814   pts/55       2023-02-21 14:14 (10.9.90.60)
zs0806   pts/76       2023-02-21 11:44 (172.21.2.7)
dpmoore  pts/50       2023-02-21 14:03 (172.20.205.177)
jdh4     pts/61       2023-02-21 14:37 (128.112.173.6)
ys5910   pts/73       2023-02-21 14:37 (172.21.2.7)
cw1074   pts/81       2023-02-21 13:39 (172.20.210.60)
$ checkquota
           Storage/size quota filesystem report for user: ceisgrub
 Filesystem               Mount                 Used   Limit  MaxLim Comment
 Adroit home              /home                9.1GB   9.3GB    10GB
 Adroit scratch           /scratch                 0       0       0
 Adroit scratch network   /scratch/network     1.7GB    93GB   100GB
           Storage number of files used report for user: ceisgrub
 Filesystem               Mount                 Used   Limit  MaxLim Comment
 Adroit home              /home                80.5K    975K    1.0M
 Adroit scratch           /scratch                 1       0       0
 Adroit scratch network   /scratch/network     18.9K    9.8M   10.5M
 For quota increase requests please use this website:
          https://forms.rc.princeton.edu/quota