A Few Caveats
These instructions assume you:
- Are Princeton University faculty, student, or staff, or you have an RCU account
- Are connecting...
- (if on-campus) from the campus wireless eduroam network, or a wired campus connection, or through Nobel
- (if off-campus) with GlobalProtect VPN, or through Nobel. You can install the GlobalProtect VPN on your laptop. Make sure you connect via the GlobalProtect app on your laptop before ssh-ing to Adroit or another cluster.
- Are connecting with Duo Authentication
- All of the clusters from both on campus and off campus now require two-factor authentication via Duo. If you need help getting this set up, contacting the OIT Support and Operations Center will be your best course of action. You can also see OIT's resources for using Duo here.
- Upon connecting, you can request a push to a cell phone application, a text with a passcode, or you can enter a generated pass code with a soft key created by the Duo application on your cell phone.
- If you use a system that respects a standard '~/.ssh/config' file, you can use a multiplexing solution.
- Have an account on the system you're looking to connect to. (Note: You will need to login using your university credentials).
To see how to get an account, visit the page for your specific system within the Systems submenu.
In order to connect to the university computing clusters, you will need an SSH (secure shell) client, a piece of software for establishing secure connections to remote machines.
On MacOS and Linux, the default Terminal application has such a client built-in. No download is necessary!
On Windows 10 machines, there is also an SSH-enabled client. (If for some reason you don't have SSH enabled under Windows 10, follow this guide to enable it).
To connect to a cluster via SSH on Linux, macOS, or Windows 10:
(Connecting on Windows 8 is discussed separately below.)
Access a command-line on your laptop
- Linux: open a Terminal window (usually by pressing Ctrl+Alt+t --- i.e. press and hold Ctrl, and without releasing it, press and hold Alt, and then without releasing either of those two keys, type 't')
- macOS: open a Terminal window (by launching the Terminal app located in /Applications/Utilities)
- Windows 10: Windows 10 has a few different ways to access a command-line interface
- PowerShell or Command Prompt --- these are a couple of Window-native (DOS-like) command-line environments. To access them, press Win+x (i.e., while holding down the Windows key, type 'x'). This opens the so-called "Power Users" menu in Windows. Select either "Command Prompt" or "PowerShell" (usually you'll see one or the other, depending on which Windows updates you've installed). Note that the Command Prompt and the PowerShell are not equivalent command-line environments in general, but for the purposes of using SSH, they work the same and either will do.
- Windows Subsystem for Linux (WSL) --- WSL is an optional feature you can enable that furnishes a genuine Linux command-line within Windows. If you have WSL enabled, then you should have an SSH client on the Linux side by default, just as you would in a regular Linux operating system.
- SSH into the cluster
The syntax for using ssh is the same in all of the above scenarios. Remember to make sure you're on a Princeton VPN, and then on the command line you accessed in the previous step, type
So for instance, if your NetID is abc1, and you'd like to connect to the Adroit cluster, you would type:
If this is your first time connecting to this cluster from whatever computer you're on, you will then see a comment about a fingerprint along with the question
Are you sure you want to continue connecting (yes/no)?Answer 'yes' and hit Enter.
You will now be prompted for your usual Princeton password. Enter it.
*NOTE*: there are no asterisks or dots to indicate how many characters you've typed, so if you think you've made a typo, hit Backspace many times and enter the password from scratch.
*NOTE*: If you've previously connected to a cluster and set up SSH keys, you will be connected without being prompted for a password first.
Depending on whether you're on a VPN and how it's configured, you may now be prompted to enter a Duo code (if you are, do so).
That's it -- you should now be connected to a cluster and see its Linux command-line prompt (e.g. [adroit4:~ <yourNetID>]$) instead of the one for your local computer.
- Ending the SSH connection to Adroit
To close the SSH connection, simply type
exitat the Adroit command line and press Enter. This should close the connection, and your local computer's command-line prompt should reappear.
To connect to a cluster via SSH on Windows 8:
Windows 8 does not have a built-in SSH client, nor does it have a WSL that offers native access to a Linux command line. So if you run Windows 8 and want to make an SSH connection from within Windows (as opposed to, say, by running Linux inside VirtualBox and connecting to Adroit from within that virtual Linux session), then you need to install a separate SSH client.
This video shows briefly how to connect to a remote server using Putty (starting at timestamp 0:54). In the field for "Host Name", enter adroit.princeton.edu (leave the port number as 22). When you connect and it prompts you "login as: ", enter your NetID and then your password (again, you may be asked to Duo authenticate after entering your password). You should then be logged into Adroit and see its Linux command-line prompt. For more detailed information about Putty, consult this guide.
MobaXTerm should work fairly similarly.
If you have trouble connecting then see this page.
Example of Connecting via SSH Using Terminal
Once you launch an instance of Terminal, you'll be at a command prompt on your local machine (i.e., on your computer) that looks something like this:
benjaminhicks ~/hpc_beginning_workshop $
The '$' is an indication that you're ready to enter a command.
To connect to a cluster, the general address looks like this, where you replace the <>'s with the needed content:
To connect to Adroit, as a user with the NetID bhicks, I'd type something like this
benjaminhicks ~/hpc_beginning_workshop $ ssh [email protected]
and after hitting Enter, I'd see something like this
nat-oitwireless-inside-vapornet100-c-14666:hpc_beginning_workshop bhicks$ ssh [email protected] Warning: the ECDSA host key for 'adroit.princeton.edu' differs from the key for the IP address '126.96.36.199' Offending key for IP in /Users/bhicks/.ssh/known_hosts:88 Matching host key in /Users/bhicks/.ssh/known_hosts:115 Are you sure you want to continue connecting (yes/no)? yes Password: Duo two-factor login for bhicks Enter a passcode or select one of the following options: 1. Duo Push to XXX-XXX-3224 2. Phone call to XXX-XXX-3224 3. Phone call to XXX-XXX-8335 4. SMS passcodes to XXX-XXX-3224 (next code starts with: 1) Passcode or option (1-4): 493203 Success. Logging you in... Last login: Wed Oct 10 09:12:28 2018 from nat-oitwireless-outside-vapornet3-l-14.princeton.edu [adroit4:~ bhicks]$
I'm now remotely connected to adroit4, which is the head node of the cluster!
The shell I used in both cases is one called Bash. It's a particular command line interface that is common across Unix-alike machines.
SSH Keys: ssh without typing passwords
Typing passwords every time you want to connect to a machine or, more annoyingly, every time you want to copy a file to/from a remote machine gets annoying quickly. One solution is to enable passwordless login/remote operations by generating a public/private pair of ssh keys and using them to negotiate the connection. The procedure is explained in this guide.
Staying connected (tmux)
If your SSH connection is suddenly broken then the command you're running terminates. This comes up very frequently when you're connected to Nobel, where tasks are run directly from the command line rather than a job scheduler.
One solution to this problem is tmux. It comes installed on all university clusters, and it lets you start a shell session that, rather than being remote via SSH, lives on the server.
A simple use case would be:
$ ssh <YourNetID>@adroit-vis.princeton.edu $ tmux $ wget https://www.bigdata.org/dataset.tar.gz # ssh connection suddenly breaks! # no problem just reconnect and attach $ ssh <YourNetID>@adroit-vis.princeton.edu $ tmux attach
To detach from your tmux session press ctrl-b, then d. To close a tmux session, run the "exit" command or press 'ctrl+d.' A tmux session will run on the remote server until the server is rebooted or you close them. Anything you run while attached to the tmux session runs in that session, and therefore it is safe from a disconnect.
Nobel has two hosts, "compton" and "davisson", and the session will live on one or the other. You can find the host by looking in the lower right corner of your tmux session window. To find your session again after a disconnect, you'll need to login directly to that host rather than just nobel, i.e. "ssh compton.princeton.edu." Otherwise you'll do "tmux attach" and not find your session.
tmux is a powerful and complex tool. In addition to the simple guide linked above, you might explore the following:
- tmux and other ways to improve your command line skills by Troy Comi of Princeton
- tmux - a very simple beginner's guide
- Beginner’s Guide to Tmux (Feel free to ignore the installation as it's already on the clusters, unless you want to run this on your Mac!)
- A tmux primer
Learn More About a Cluster by Running Commands
Once in a cluster, type each command below and examine the output:
hostname # get the name of the machine you are on whoami # get username of the account date # get the current date and time pwd # print working directory cat /etc/os-release # info about operating system lscpu # info about the CPUs on head node shownodes # info about the compute nodes (7 nodes for myadroit) squeue # which jobs are running or waiting to run qos # quality of service (job partitions and limits) slurmtop # shows a map of cluster usage who # list users on the head node checkquota # view your quota and request more space
Here is example output from the commands above on Adroit:
$ hostname adroit4 $ whoami ceisgrub $ date Fri Feb 21 16:20:03 EST 2020 $ pwd /home/ceisgrub $ cat /etc/os-release NAME="Springdale Linux" VERSION="7.7 (Verona)" ID="rhel" ID_LIKE="fedora" VERSION_ID="7.7" PRETTY_NAME="Springdale Linux 7.7 (Verona)" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:springdale:linux:7.7:GA" HOME_URL="http://springdale.princeton.edu/" BUG_REPORT_URL="https://springdale.math.ias.edu/" REDHAT_BUGZILLA_PRODUCT="Springdale Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.7 REDHAT_SUPPORT_PRODUCT="Springdale Linux" REDHAT_SUPPORT_PRODUCT_VERSION=7.7 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 1 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz Stepping: 4 CPU MHz: 1021.179 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 5200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 22528K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear spec_ctrl intel_stibp flush_l1d $ shownodes NODELIST PART STATE FREE/TOTAL CPUs CPU_LOAD FREE/TOTAL MEMORY FREE/TOTAL GPUs FEATURES adroit-01 class idle 28/28 0.02 120739/128000Mb broadwell adroit-02 class idle 28/28 0.00 120754/128000Mb broadwell adroit-03 class idle 28/28 0.01 120811/128000Mb broadwell adroit-04 class idle 28/28 0.00 120808/128000Mb broadwell adroit-05 class idle 28/28 0.00 120823/128000Mb broadwell adroit-06 class idle 28/28 0.00 120815/128000Mb broadwell adroit-07 class idle 28/28 0.00 120811/128000Mb broadwell adroit-08 all mixed 3/32 24.85 366279/384000Mb skylake adroit-09 all allocated 0/32 17.02 371481/384000Mb skylake adroit-10 all allocated 0/32 32.04 357999/384000Mb skylake adroit-11 all mixed 24/32 7.94 373102/384000Mb skylake adroit-12 all mixed 7/32 19.48 366613/384000Mb skylake adroit-13 all mixed 1/32 23.45 353454/384000Mb skylake adroit-14 all idle 32/32 0.00 377183/384000Mb skylake adroit-15 all mixed 16/32 0.00 370577/384000Mb skylake adroit-16 all mixed 17/32 7.50 369325/384000Mb skylake adroit-h11g1 gpu idle 40/40 0.07 763765/770000Mb 4/4 tesla_v100 v100 adroit-h11g2 gpu idle 48/48 0.00 1019578/1000000Mb 4/4 nvidia_a100 a100 adroit-h11n1 class mixed 126/128 0.00 244004/256000Mb amd,rome $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 660090 all run_05_0 fanid PD 0:00 1 (QOSMaxJobsPerUserLimit) 661432 all KL mlenel PD 0:00 1 (Dependency) 661431_[6-7] all KL mlenel PD 0:00 1 (Dependency) 661439 all sys/dash lclingan R 37:28 1 adroit-10 661429_6 all KL mlenel R 1:13:28 1 adroit-08 661429_7 all KL mlenel R 1:13:28 1 adroit-08 660093 all run_15_0 fanid R 3-20:26:13 1 adroit-16 660092 all run_15_0 fanid R 3-20:27:14 1 adroit-08 660091 all run_15_0 fanid R 3-20:28:42 1 adroit-08 660087 all run_15_1 fanid R 9-12:05:41 1 adroit-08 660999 all BBR_m06- kyras R 2-21:31:33 1 adroit-12 660996 all BMR_m06- kyras R 2-21:40:50 1 adroit-11 660085 all run_1_1_ fanid R 10-00:38:49 1 adroit-08 660086 all run_05_1 fanid R 10-00:38:49 1 adroit-08 660088 all run_05_0 fanid R 10-00:38:49 1 adroit-16 660089 all run_05_0 fanid R 10-00:38:49 1 adroit-16 660116 all tBu3PPd_ leit R 8-05:22:17 1 adroit-11 660114 all Phdiimin leit R 8-14:31:04 1 adroit-14 660115 all tBu3PPd_ leit R 8-14:31:04 1 adroit-14 660113 all Phdiimin leit R 9-12:05:41 1 adroit-12 661298 all job.scri vincenzi R 6:37:12 1 adroit-13 661391 all run_cali hherman R 3:59:23 1 adroit-10 $ who bill pts/0 2020-02-18 08:17 (delta.princeton.edu) yongickc pts/1 2020-02-20 08:11 (chm-c07y213xjyvx.princeton.edu) zhengshi pts/2 2020-02-10 01:59 (:pts/25:S.0) root pts/3 2020-02-19 13:54 (adroit-nfs2) keweiz pts/4 2020-02-21 09:01 (vpn10-client-128-112-69-24.princeton.edu) haonan pts/5 2020-02-21 15:23 (myadroit) jdh4 pts/9 2020-02-21 16:12 (tigressgateway1.princeton.edu) yongickc pts/10 2020-02-17 08:14 (chm-c07y213xjyvx.princeton.edu) zhengy pts/11 2020-02-21 09:26 (nat-oitwireless-inside-vapornet100-10-8-2-157.princeton.edu) msislam pts/12 2020-02-21 15:36 (nat-oitwireless-inside-vapornet100-10-9-125-87.princeton.edu) aidanm pts/13 2020-02-21 10:03 (pni-10h01410k2euh.princeton.edu) meggl pts/15 2020-02-21 10:04 (vpn10-client-128-112-71-193.princeton.edu) mrasna pts/16 2020-02-21 14:54 (nat-oitwireless-inside-vapornet100-10-8-5-193.princeton.edu) nmishra pts/17 2020-02-21 15:55 (nat-oitwireless-inside-vapornet100-10-9-111-71.princeton.edu) cfei pts/19 2020-02-21 13:59 (nat-oitwireless-inside-vapornet100-10-9-154-116.princeton.edu) jdeobald pts/21 2020-02-21 11:42 (dynamic-oit-swiftnet-128-112-127-249.princeton.edu) dianw pts/22 2020-02-21 11:54 (vpn10-client-128-112-69-89.princeton.edu) xiaoyuel pts/23 2020-02-21 14:26 (vpn10-client-128-112-71-50.princeton.edu) perezgiz pts/24 2020-02-21 14:07 (josko-eth-dongle.princeton.edu) zhengshi pts/25 2020-02-12 14:41 (:pts/20:S.0) keweiz pts/26 2020-02-21 14:30 (vpn10-client-128-112-69-24.princeton.edu) keweiz pts/27 2020-02-21 16:09 (vpn10-client-128-112-69-24.princeton.edu) $ checkquota Storage/size quota filesystem report for user: ceisgrub Filesystem Mount Used Limit MaxLim Comment Adroit home /home 9.1GB 9.3GB 10GB Adroit scratch /scratch 0 0 0 Adroit scratch network /scratch/network 1.7GB 93GB 100GB Storage number of files used report for user: ceisgrub Filesystem Mount Used Limit MaxLim Comment Adroit home /home 80.5K 975K 1.0M Adroit scratch /scratch 1 0 0 Adroit scratch network /scratch/network 18.9K 9.8M 10.5M For quota increase requests please use this website: https://forms.rc.princeton.edu/quota