Tiger

The Tiger cluster is one of Princeton's most powerful clusters. It is meant for running parallel jobs, as all other types of jobs are given low priority here.

Some Technical Specifications:
The Tiger cluster has two parts: tigercpu and tigergpu.

The tigercpu part is an HPE Apollo cluster comprised of 408 Intel Skylake CPU nodes. Each CPU processor core has at least 4.8 GB of memory. Every 40-core node is interconnected by a Omnipath fabric with oversubscription. There are 24 nodes per chassis all connected with the full bandwidth.

The tigergpu part is a Dell computer cluster comprised of 320 NVIDIA P100 GPUs across 80 Broadwell nodes, each GPU processor core has 16 GB of memory. The nodes are interconnected by an Intel Omnipath fabric. Each GPU is on a dedicated x16 PCI bus. The nodes all have 2.9TB of NVMe connected scratch as well as 256G RAM. The CPUs are Intel Broadwell e5-2680v4 with 28 cores per node. Monitor GPU utilization using various tools such as gpudash and nvidia-smi.

For more hardware details, see the Hardware Configuration information below.

How to Access the Tiger Cluster

To use the Tiger cluster you have to request an account and then log in through SSH.

  1. Requesting Access to Tiger

    Access to the large clusters like Tiger is granted on the basis of brief faculty-sponsored proposals (see For large clusters: Submit a proposal or contribute).

    If, however, you are part of a research group with a faculty member who has contributed to or has an approved project on Tiger, that faculty member can sponsor additional users by sending a request to cses@princeton.edu. Any non-Princeton user must be sponsored by a Princeton faculty or staff member for a Research Computer User (RCU) account.
     
  2. Logging into Tiger

    Once you have been granted access to Tiger, you can log in through the SSH command as seen below. Use tigercpu for CPU-only usage, and use tigergpu for added GPU support.

    To log into tigercpu (VPN required from off-campus):

    $ ssh <YourNetID>@tiger.princeton.edu
    

    To log into tigergpu (VPN required from off-campus):

    $ ssh <YourNetID>@tigergpu.princeton.edu

    For more on how to SSH, see the Knowledge Base article Secure Shell (SSH): Frequently Asked Questions (FAQ). If you have trouble connecting then see our SSH page.

 

How to Use the Tiger Cluster

Since Tiger is a Linux system, knowing some basic Linux commands is highly recommended. For an introduction to navigating a Linux system, view the material associated with our Intro to Linux Command Line workshop. 

Using Tiger also requires some knowledge on how to properly use the file system, module system, and how to use the scheduler that handles each user's jobs. For an introduction to navigating Princeton's High Performance Computing systems, view the material associated with our Getting Started with the Research Computing Clusters workshop. Additional information specific to Tiger's file system, priority for job scheduling, etc. can be found below.

To attend a live session of either workshop, see our Trainings page for the next available workshop.
For more resources, see our Support - How to Get Help page.

 

Important Guidelines

Please remember that these are shared resources for all users.

The login nodes, tigercpu and tigergpu, should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the login node other than brief tests that last no more than a few minutes. Where practical, we ask that you entirely fill the compute nodes so that CPU core fragmentation is minimized.

Jobs can be submitted for either portion of the Tiger system from either login node, but it is best to compile programs on the login node associated with the portion of the system where the program will run. That is, compile GPU jobs on tigergpu and non-GPU jobs on tigercpu. Running a job on the GPU nodes requires additional specifications in the job script. Refer to our Slurm page for instructions.

 

Hardware Configuration

Clusters Processor
Speed
Nodes Cores
per Node
Memory
per Node
Total Cores Inter-connect Performance:
Theoretical
TigerGPU
Dell Linux Cluster
2.4 GHz Xeon Broadwell
E5-2680 v4
80 28 256 GB 2240   Omnipath 86 TFLOPS
(GPU info) 1328 MHz P100   4 GPU/node 16 GB/CPU 320 GPUs Omnipath 1504 TFLOPS
TigerCPU
HPE Linux Cluster
2.4 GHz Skylake. 408 40 192 GB or 768 GB 16320 Omnipath >1103 TFLOPS

Distribution of CPU and memory

There are 16,320 processors available, 40 per node. Each node contains at least 192 GB of memory (4.8 GB per core). The nodes are assembled into 24 node chassis where each chassis has a 1:1 Omnipath connection. There is oversubscription between chassis at 2:1.

There are also 40 nodes with memory of 768 GB (19 GB per core). These larger memory nodes also have SSD drives for faster I/O locally.

The nodes are all connected through Omnipath switches for MPI traffic, GPFS, and NFS I/O and over a Gigabit Ethernet for other communication.

For more technical details, click here to see the full version of the systems table.

 

Job Scheduling (QOS Parameters)

Jobs are prioritized through the Slurm scheduler based on a number of factors: job size, run times, node availability, wait times, and percentage of usage over a 30 day period as well as a fairshare mechanism to provide access for large contributors. The policy below may change as the job mix changes on the machine.

Jobs will move to the test, vshort, short, medium, or long quality of service as determined by the scheduler. They are differentiated by the wallclock time requested as follows:

CPU Jobs

QOS Time Limit Jobs per User Cores per Job Cores Available
tiger-test 1 hour 2 no limit 15560
tiger-vshort 6 hours 32 no limit no limit
tiger-short 24 hours 24 4360 13360
tiger-medium 72 hours 16 3840 7840
tiger-long 144 hours
(6 days)
12 1200 5520

 

GPU Jobs

QOS Time Limit Jobs per User GPUs per User
gpu-test 1 hour 2 no limit
gpu-short 24 hours 24 24
gpu-medium 72 hours 10 16
gpu-long 144 hours
(6 days)
3 16

Note that the above numbers and limits may be changed if demand requires. Use the "qos" command to view the actual values in effect.

 

Tiger Schematic

A schematic diagram of the tiger cluster

 

 

Wording of Acknowledgement of Support and/or Use of Research Computing Resources

"The author(s) are pleased to acknowledge that the work reported on in this paper was substantially performed using the Princeton Research Computing resources at Princeton University which is consortium of groups led by the Princeton Institute for Computational Science and Engineering (PICSciE) and Office of Information Technology's Research Computing."

"The simulations presented in this article were performed on computational resources managed and supported by Princeton Research Computing, a consortium of groups including the Princeton Institute for Computational Science and Engineering (PICSciE) and the Office of Information Technology's High Performance Computing Center and Visualization Laboratory at Princeton University."