Tiger is designed for running large parallel jobs. The cluster is currently composed of 320 CPU nodes (35,840 Intel CPU-cores) and 12 GPU nodes (48 H100 GPUs). Access to those GPUs is currently restricted to only those PIs on the MRI grant from NSF.Trouble Connecting via SSHYou may encounter the following error:$ ssh <YourNetID>@tiger.princeton.edu kex_exchange_identification: read: Operation timed out banner exchange: Connection to 128.112.0.205 port 22: Operation timed outThe solution is to connect while using the university VPN (i.e., Global Protect). How to Access the Tiger ClusterTo use the Tiger cluster you have to request an account and then log in through SSH.Requesting Access to TigerAccess to the large clusters like Tiger is granted on the basis of brief faculty-sponsored proposals (see For large clusters: Submit a proposal or contribute).If, however, you are part of a research group with a faculty member who has contributed to or has an approved project on Tiger, that faculty member can sponsor additional users by sending a request to [email protected]. Any non-Princeton user must be sponsored by a Princeton faculty or staff member for a Research Computer User (RCU) account. Logging into TigerOption 1Once you have been granted access to Tiger, you can connect by opening an SSH client and using the SSH command as displayed below.To log into tiger (VPN required from off-campus):$ ssh <YourNetID>@tiger.princeton.eduFor more on how to SSH, see the Knowledge Base article Secure Shell (SSH): Frequently Asked Questions (FAQ). If you have trouble connecting then see our SSH page.Option 2If you prefer to navigate Tiger through a graphical user interface rather than the Linux command line, there is also a web portal called MyTiger (VPN required from off-campus):https://mytiger.princeton.edu MyTiger provides access to the cluster through a web browser. This enables easy file transfers and interactive jobs such as Jupyter. How to Use the Tiger ClusterSince Tiger is a Linux system, knowing some basic Linux commands is highly recommended. For an introduction to navigating a Linux system, view the material associated with our Intro to Linux Command Line workshop. Using Tiger also requires some knowledge on how to properly use the file system, module system, and how to use the scheduler that handles each user's jobs. For an introduction to navigating Princeton's High Performance Computing systems, view our Guide to Princeton's Research Computing Clusters. Additional information specific to Tiger's file system, priority for job scheduling, etc. can be found below.For more resources, see our Support - How to Get Help page. Important GuidelinesPlease remember that these are shared resources for all users.The login node tigercpu should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the login node other than brief tests that last no more than a few minutes. Where practical, we ask that you entirely fill the compute nodes so that CPU core fragmentation is minimized.Refer to our Slurm page for job submission instructions. Hardware ConfigurationCPU NodesNodesCoresper NodeMemoryper NodeTotal CoresInterconnectProcessorTheoreticalPerformance3201121 TB35840Infiniband NDR2.0 GHz Intel Sapphire Rapids 2.3 PFLOPSGPU NodesNodesGPUs per NodeGPUMemory per GPUCPU-Cores per NodeCPU Memory per NodeProcessor124H10080 GB1121 TB2.0 GHz Intel Sapphire RapidsDistribution of CPU and memoryThere are 35,840 processors available (112 per node). The nodes are all connected through Infiniband NDR switches for MPI traffic, GPFS, and NFS I/O and over a Gigabit Ethernet for other communication.For more technical details, please see the full version of the systems table. Job Scheduling (QOS Parameters)Jobs are prioritized through the Slurm scheduler based on a number of factors: job size, run times, node availability, wait times, and percentage of usage over a 30 day period as well as a fairshare mechanism to provide access for large contributors. The policy below may change as the job mix changes on the machine.Jobs will move to the test, vshort, short, medium, or long quality of service as determined by the scheduler. They are differentiated by the wallclock time requested as follows:QOSTime LimitJobs per UserMax Cores per UserCores Availabletiger-test1 hour1800015560tiger-vshort5 hours30090030000tiger-short24 hours400800030000tiger-medium72 hours120500013000tiger-long144 hours(6 days)1216005500Note that the above numbers and limits may be changed if demand requires. Use the "qos" command to view the actual values in effect. Software and Environment ModulesTo see the available environment modules, run the following command:$ module availCompilersThe Intel oneAPI compilers should be favored over GCC. To compile a simple MPI code with Intel oneAPI:$ module load intel-oneapi/2024.2 $ module load intel-mpi/oneapi/2021.13 $ mpiicpx -Ofast -xsapphirerapidss -o hello_world_mpi hello_world_mpi.cpp The compilers of Intel oneAPI are:icx is the C compiler (replaces icc)icpx is the C++ compiler (replaces icx)ifx is the Fortran compiler (replaces ifort)The parallel (wrapper) compilers of Intel oneAPI are:mpiicx is for MPI codes in Cmpiicpx is for MPI codes in C++mpiifx is for MPI codes in Fortran For more on using the Intel oneAPI compilers see the Intel MPI guide. You can learn more about what the wrapper compiler is doing by running the following command:$ mpiicpx -show icpx -I"/opt/intel/oneapi/mpi/2021.13/include" -L"/opt/intel/oneapi/mpi/2021.13/lib" -L"/opt/intel/oneapi/mpi/2021.13/lib" -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker "/opt/intel/oneapi/mpi/2021.13/lib" -Xlinker -rpath -Xlinker "/opt/intel/oneapi/mpi/2021.13/lib" -lmpicxx -lmpifort -lmpi -ldl -lrt -lpthread The Intel Classic Compilers are not available on Tiger. You will not find icc, icpc or ifort. Instead users should use the Intel oneAPI compilers as described above.PythonThe Anaconda Python distribution should be used when working with Python:$ module avail anaconda3 $ module load anaconda3/2024.6 (base) $ python --versionLearn more about installing and running Python software on the Research Computing systems. Slurm Job SubmissionAccountingUsers with multiple Slurm accounts must add the following Slurm directive to their Slurm scripts:#SBATCH --account=<account-name>Run the "sshare" command to see the different account names. If you fail to specify an account then you will see the following error message:salloc: error: ERROR: You have to specify an account for your slurm jobs with --account option from these options: <option1> <option2> salloc: error: Job submit/allocate failed: Invalid account or account/partition combination specifiedFor interactive jobs, one can do:$ salloc --nodes=1 --ntasks=4 --mem=16G --time=00:20:00 --account=<account-name> GPU JobsGPU access is restricted at this time to only those PIs on the MRI grant from NSF. To test for access, run the salloc command below:$ salloc --nodes=1 --ntasks=1 --mem=4G --time=00:05:00 --gres=gpu:1 --account=<account> salloc: Granted job allocation 2033 salloc: Waiting for resource configuration salloc: Nodes tiger-g06c4g2 are ready for job $ Run the “exit” command to leave the compute node and return to the login node. Learn more about Slurm and GPU jobs.If you do not have access then you will encounter an error message:sbatch: error: Batch job submission failed: Invalid qos specification Open OnDemand (Jupyter, Graphical Desktop)There is a web portal for running Jupyter notebooks at https://mytiger.princeton.edu (VPN is required to connect from off-campus). Follow the directions on our Jupyter page for working with custom Conda environments. Choose "Interactive Apps" then either "Jupyter" or "Jupyter on Tiger Vis Node". The first choice is for intensive sessions while the latter runs on the visualization node which is for light work that requires internet access. To request a GPU, enter the following in the field for "Extra slurm options":--gres=gpu:1You may need to also specify an account:--gres=gpu:1 --account=<account-name> /projects is Not Available on the Compute NodesThe /projects storage system is not available on the compute nodes. Users should store job data on /scratch/gpfs which is fast and available on the compute nodes. After a job completes, from the login node, one can copy the job output from /scratch/gpfs to /projects for backup and long-term storage. How to Use /scratch/gpfs?Each user will have access to at least one fileset of the form:/scratch/gpfs/<YourResearchGroup>You should create your own directory within that path and store your files there:$ cd /scratch/gpfs/<YourResearchGroup> $ mkdir <YourDirectory> Note that /scratch/gpfs is not backed up. Visualization NodeThe tiger-vis node can be used for visualization and post-processing tasks. It offers 64 CPU-cores, 1 TB of CPU memory, and 2 NVIDIA L4 GPUs (each with 24 GB of GPU memory). Users can connect via SSH with the following command (VPN required if connecting from off-campus):$ ssh <YourNetID>@tiger-vis.princeton.edu Note that there is no job scheduler on tiger-vis, so please be considerate of other users when using this node. To ensure that the system remains a shared resource, there are limits in place preventing one individual from using all of the resources. You can check your activity with the command "htop -u <YourNetID>".In addition to visualization, the node can be used for tasks that are incompatible with the Slurm job scheduler, or for work that is not appropriate for the Tiger login node (such as downloading large amounts of data from the internet). Maintenance WindowTiger will be down for routine maintenance on the second Tuesday of every month from approximately 6 AM to 2 PM. This includes the associated filesystems of /scratch/gpfs and /projects. Please mark your calendar. Jobs submitted close to downtime will remain in the queue unless they can be scheduled to finish before downtime (see more). Users will receive an email when the cluster is returned to service. Filesystems/home (shared via NFS to all the compute nodes) is intended for scripts, source code, executables and small static data sets that may be needed as standard input/configuration for codes./scratch/gpfs/<YourResearchGroup>/<YourDirectory> is intended for dynamic data that requires higher bandwidth I/O. Files are NOT backed up so this data should be moved to persistent storage as soon as it is no longer needed for computations. Please remove files on /scratch/gpfs that you no longer need./projects (shared using GPFS) is intended for more persistent storage and should provide high bandwidth I/O (8 GB/s aggregate bandwidth for jobs across 16 or more nodes). Users are provided with a default quota of 512 GB when they request a directory in this storage, and that default can be increased by requesting more. We do ask people to consider what they really need, and to make sure they regularly clean out data that is no longer needed since this filesystem is shared by the users of all our systems./tmp (local to each compute node) is intended for data local to each task of a job, and it should be cleaned out at the end of each job. This is the fastest storage for access. Wording of Acknowledgement of Support and/or Use of Research Computing Resources"The author(s) are pleased to acknowledge that the work reported on in this paper was substantially performed using the Princeton Research Computing resources at Princeton University which is consortium of groups led by the Princeton Institute for Computational Science and Engineering (PICSciE) and Research Computing.""The simulations presented in this article were performed on computational resources managed and supported by Princeton Research Computing, a consortium of groups including the Princeton Institute for Computational Science and Engineering (PICSciE) and Research Computing at Princeton University."