Adroit

The Adroit cluster is intended for running smaller jobs, as well as developing, debugging, and testing codes. Despite being one of our smaller clusters, Adroit is built like our larger clusters (such as Della, Tiger, etc.), and is therefore ideal to use as training for eventual work on the larger clusters.

Some Technical Specifications:
Adroit is a 9 node Beowulf cluster acquired through a partnership between Dell Computer Corporation and OIT. Each compute node has thirty-two 2.60 GHz Intel Skylake CPU-cores and 384 GB RAM.  There are also two nodes which have GPUs: one with four NVIDIA V100s and one with two K40c GPUs. For more details, see the Hardware Configuration section below. 

 

How to Access the Adroit Cluster

To use the Adroit cluster you have to request an account on Adroit and then log in through SSH.
 

  1. Requesting Access to Adroit

    If you would like an account on Adroit, please fill out the Adroit Registration form to request an account. 
     
  2. Logging into Adroit
     

    Once you have been granted access to Adroit, you should be able to SSH into it using the command below.

    $ ssh <YourNetID>@adroit.princeton.edu
    

    For more on how to SSH, see the Knowledge Base article Secure Shell (SSH): Frequently Asked Questions (FAQ).

    If you prefer to navigate Adroit through a graphical user interface rather than the Linux command line, Adroit has a web portal option called MyAdroit (https://myadroit.princeton.edu). This enables easy file transfers and interactive jobs: RStudio, Jupyter, Stata and MATLAB. A VPN is required to access the web portal from off-campus. We recommend using the GlobalProtect VPN service.

 

How to Use the Adroit Cluster

Since Adroit is a Linux system, knowing some basic Linux commands is highly recommended. For an introduction to navigating a Linux system, view the material associated with our Intro to Linux Command Line workshop. 

Using Adroit also requires some knowledge on how to properly use the file system, module system, and how to use the scheduler that handles each user's jobs. For an introduction to navigating Princeton's High Performance Computing systems, view the material associated with our Getting Started with the Research Computing Clusters workshop. Additional information specific to Adroit's file system, priority for job scheduling, etc. can be found below.

To attend a live session of either workshop, see our Trainings page for the next available workshop.
For more resources, see our Support - How to Get Help page.

 

Important Guidelines

The head node on Adroit should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the head node, other than brief tests that last no more than a few minutes.

If you'd like to run a Jupyter notebook, we have a few options for running Jupyter notebooks so that you can avoid running on Adroit's head node.

 

Job Scheduling (QOS Parameters)

All jobs must be run through the scheduler on Adroit. If a job would exceed any of the limits below, it will be held until it is eligible to run. A job should not specify the qos (quality of service) into which it should run, allowing the scheduler to route the job according to the resources it requires. The tables below apply to jobs submitted via Slurm and do not necessarily apply to MyAdroit.

CPU Jobs

QOS Time Limit Jobs per user Cores per User Cores Available
test 2 hours 2 jobs 80 cores
5 nodes/user
no limit
short 4 hours 32 jobs 80 cores no limit
medium 24 hours 4 jobs 64 cores
5 nodes/all users
100 cores
long 15 days 2 jobs 64 cores
4 nodes/all users
80 cores

 

GPU Jobs

QOS Time Limit GPUs per User
gpu-test 1 hour no limit
gpu-short 4 hours 4
gpu-medium 24 hours 2
gpu-long 7 days 1

 

Running GPU Jobs

There are two GPU nodes on Adroit. The newer adroit-h11g1 node features four NVIDIA V100 GPUs each with 32 GB of memory while the older adroit-h11g4 node features two NVIDIA K40c GPUs each with 12 GB of memory.

By default all GPU jobs run on the V100 node. Use the following Slurm directive to request a GPU for your job: 

#SBATCH --gres=gpu:1

If you want your job to run on the older K40c node then use these Slurm directives:

#SBATCH --gres=gpu:1
#SBATCH --constraint=k40

Note that some codes like PyTorch and NVIDIA Rapids are not supported on the K40c node.

Use of the qos command to see the restrictions on the number of GPUs per QOS. For instance, when using all four V100 GPUs a job must have a runtime of less than four hours. See the "Job Scheduling" section for the exact limits. See the priority page for information on estimating when your queued job will run. See the TigerGPU Utilization page to see how to measure GPU utilization.

A common mistake is to run a CPU-only code on a GPU. Only codes that have been explicitly written to run on a GPU can take advantage of a GPU. Read the documentation for the code that you are using to see if it can use a GPU.