The Tiger cluster is one of Princeton's most powerful clusters. It is meant for running parallel jobs, as all other types of jobs are given low priority here.
Some Technical Specifications:
The Tiger cluster has two parts: tigercpu and tigergpu.
The tigercpu part is an HPE Apollo cluster comprised of 408 Intel Skylake CPU nodes. Each CPU processor core has at least 4.8 GB of memory. Every 40-core node is interconnected by a Omnipath fabric with oversubscription. There are 24 nodes per chassis all connected with the full bandwidth.
The tigergpu part is a Dell computer cluster comprised of 320 NVIDIA P100 GPUs across 80 Broadwell nodes, each GPU processor core has 16 GB of memory. The nodes are interconnected by an Intel Omnipath fabric. Each GPU is on a dedicated x16 PCI bus. The nodes all have 2.9TB of NVMe connected scratch as well as 256G RAM. The CPUs are Intel Broadwell e5-2680v4 with 28 cores per node. Monitor GPU utilization using various tools such as gpudash and nvidia-smi.
For more hardware details, see the Hardware Configuration information below.
How to Access the Tiger Cluster
To use the Tiger cluster you have to request an account and then log in through SSH.
- Requesting Access to Tiger
Access to the large clusters like Tiger is granted on the basis of brief faculty-sponsored proposals (see For large clusters: Submit a proposal or contribute).
If, however, you are part of a research group with a faculty member who has contributed to or has an approved project on Tiger, that faculty member can sponsor additional users by sending a request to email@example.com. Any non-Princeton user must be sponsored by a Princeton faculty or staff member for a Research Computer User (RCU) account.
- Logging into Tiger
Once you have been granted access to Tiger, you can log in through the SSH command as seen below. Use tigercpu for CPU-only usage, and use tigergpu for added GPU support.
To log into tigercpu (VPN required from off-campus):
$ ssh <YourNetID>@tiger.princeton.edu
To log into tigergpu (VPN required from off-campus):
$ ssh <YourNetID>@tigergpu.princeton.edu
For more on how to SSH, see the Knowledge Base article Secure Shell (SSH): Frequently Asked Questions (FAQ). If you have trouble connecting then see our SSH page.
How to Use the Tiger Cluster
Since Tiger is a Linux system, knowing some basic Linux commands is highly recommended. For an introduction to navigating a Linux system, view the material associated with our Intro to Linux Command Line workshop.
Using Tiger also requires some knowledge on how to properly use the file system, module system, and how to use the scheduler that handles each user's jobs. For an introduction to navigating Princeton's High Performance Computing systems, view the material associated with our Getting Started with the Research Computing Clusters workshop. Additional information specific to Tiger's file system, priority for job scheduling, etc. can be found below.
Please remember that these are shared resources for all users.
The login nodes, tigercpu and tigergpu, should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the login node other than brief tests that last no more than a few minutes. Where practical, we ask that you entirely fill the compute nodes so that CPU core fragmentation is minimized.
Jobs can be submitted for either portion of the Tiger system from either login node, but it is best to compile programs on the login node associated with the portion of the system where the program will run. That is, compile GPU jobs on tigergpu and non-GPU jobs on tigercpu. Running a job on the GPU nodes requires additional specifications in the job script. Refer to our Slurm page for instructions.
Dell Linux Cluster
|2.4 GHz Xeon Broadwell
|80||28||256 GB||2240||Omnipath||86 TFLOPS|
|(GPU info)||1328 MHz P100||4 GPU/node||16 GB/CPU||320 GPUs||Omnipath||1504 TFLOPS|
HPE Linux Cluster
|2.4 GHz Skylake.||408||40||192 GB or 768 GB||16320||Omnipath||>1103 TFLOPS|
Distribution of CPU and memory
There are 16,320 processors available, 40 per node. Each node contains at least 192 GB of memory (4.8 GB per core). The nodes are assembled into 24 node chassis where each chassis has a 1:1 Omnipath connection. There is oversubscription between chassis at 2:1.
There are also 40 nodes with memory of 768 GB (19 GB per core). These larger memory nodes also have SSD drives for faster I/O locally.
The nodes are all connected through Omnipath switches for MPI traffic, GPFS, and NFS I/O and over a Gigabit Ethernet for other communication.
For more technical details, click here to see the full version of the systems table.
Jobs are prioritized through the Slurm scheduler based on a number of factors: job size, run times, node availability, wait times, and percentage of usage over a 30 day period as well as a fairshare mechanism to provide access for large contributors. The policy below may change as the job mix changes on the machine.
Jobs will move to the test, vshort, short, medium, or long quality of service as determined by the scheduler. They are differentiated by the wallclock time requested as follows:
|QOS||Time Limit||Jobs per User||Cores per Job||Cores Available|
|tiger-test||1 hour||2||no limit||15560|
|tiger-vshort||6 hours||32||no limit||no limit|
|QOS||Time Limit||Jobs per User||GPUs per User|
|gpu-test||1 hour||2||no limit|
Note that the above numbers and limits may be changed if demand requires. Use the "qos" command to view the actual values in effect.
Wording of Acknowledgement of Support and/or Use of Research Computing Resources
"The author(s) are pleased to acknowledge that the work reported on in this paper was substantially performed using the Princeton Research Computing resources at Princeton University which is consortium of groups led by the Princeton Institute for Computational Science and Engineering (PICSciE) and Office of Information Technology's Research Computing."
"The simulations presented in this article were performed on computational resources managed and supported by Princeton Research Computing, a consortium of groups including the Princeton Institute for Computational Science and Engineering (PICSciE) and the Office of Information Technology's High Performance Computing Center and Visualization Laboratory at Princeton University."