Adroit is an 9 node Beowulf cluster acquired through a partnership between Dell Computer Corporation and OIT.

Each compute node has thirty-two 2.60 GHz Intel Skylake CPU-cores and 384 GB RAM.  There are also two nodes which have GPUs: one with four NVIDIA V100s and one with two K40c GPUs . The cluster is intended for developing, debugging, and testing codes as well as for small production jobs.  There is also an OnDemand node, an open-source HPC web portal, for using the cluster called MyAdroit, A VPN is required to connect to MyAdroit.

If you would like an account on Adroit, please click on the Adroit Registration link to fill out the registration form.

System Configuration and Usage

General Guidelines

The head node, adroit, should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the head node, other than brief tests that last no more than a few minutes.

Please remember that system resources are shared by all users.

Maintenance Window

There is no scheduled maintenance for Adroit during academic semesters. Adroit will be down for maintenance between academic semester

Hardware Configuration
Nodes Cores
per Node
per Node
Total Cores Inter-connect Performance:
CPU Nodes 2.6 GHz 9 32 384 GB 288 FDR Infiniband 3.2 TFLOPS
GPU Node 745 MHz K40c 1 2 GPUs/node 12.0 GB/GPU 2880   1.682 TFLOPS/GPU
GPU Node 1246 MHz V100 1 4 GPUs/node 16.0 GB/GPU 5120   7.066 TFLOPS/GPU

Distribution of CPU and memory

On the CPU nodes, there are 288 processors available, thirty two per node. Each node contains 384 GB of memory. The nodes are identified as adroit-08 through adroit-16. There are 2 nodes containing GPUs. adroit-h11g1 is a Skylake node with 40 CPU-cores at 2.4 GHz, 770 GB of memory and four NVIDIA V100 GPUs. adroit-h11g4 is a Haswell node with 16 CPU-cores at 3.2 GHz, 64 GB of memory and two NVIDIA K40c GPUs.
Job Scheduling (QOS parameters)

All jobs must be run through the scheduler on Adroit. If a job would exceed any of the limits below, it will be held until it is eligible to run. A job should not specify the qos (quality of service) into which it should run, allowing the scheduler to route the job according to the resources it requires. Currently, jobs move to either the testshort, medium or long queue, as follows:

QOS Time Limit Jobs per user Cores per User Cores Available
test 2 hours 2 jobs 80 cores
5 nodes/user
no limit
short 4 hours 32 jobs 80 cores no limit
medium 24 hours 4 jobs 64 cores
5 nodes/all users
100 cores
long 15 days 2 jobs 64 cores
4 nodes/all users
80 cores
Scratch Space

Please use /scratch/network/username for the output of running jobs and to store large datasets. This space is an NFS-mounted shared space of close to 1 TB. Run the checkquota command to see your usage. Files are NOT backed up so move any important files to long-term storage (e.g., your /home directory or your local machine).

Scratch space is available in /tmp on every compute node. Since this storage is not shared across all nodes, it is ideally suited for temporary output.

Running Third-party Software

If you are running 3rd-party software whose characteristics (e.g., memory usage) you are unfamiliar with, please check your job after 5-15 minutes using 'top' or 'ps -ef' on the compute nodes being used. If the memory usage is growing rapidly, or close to exceeding the per-processor memory limit, you should terminate your job before it causes the system to hang or crash.