Adroit

Adroit is an 9 node Beowulf cluster acquired through a partnership between Dell Computer Corporation and OIT.

Each compute node has thirty two 2.60 GHz Intel Skylake processors and 384 GB RAM.  There are also four other nodes which have 4 GPUs each (nVidia K20m). The cluster is intended for developing, debugging, and testing codes as well as for small production jobs.

If you would like an account on Adroit, please click on the Adroit Registration link (link is external) to fill out the registration form.

System Configuration and Usage

General Guidelines

The head node, adroit, should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the head node, other than brief tests that last no more than a few minutes.

Please remember that system resources are shared by all users.

Maintenance Window

There is no scheduled maintenance for Adroit during academic semesters. Adroit will be down for maintenance between academic semester

Hardware Configuration

  Processor
Speed
Nodes Cores
per Node
Memory
per Node
Total Cores Inter-connect Performance:
Theoretical
Adroit
Dell Linux Cluster
2.5 GHz Ivybridge
705 MHz K20
9
4
32
4 GPU/node
384 GB
5.0 GB/GPU
288 FDR Infiniband 3.2 TFLOPS
9.36 TFLOPS

Distribution of CPU and memory

There are 288 processors available, thirty two per node. Each node contains 384 GB of memory. The nodes are identified as adroit-08 through adroit-16.

 

Job Scheduling (QOS parameters)

All jobs must be run through the scheduler on Adroit. If a job would exceed any of the limits below, it will be held until it is eligible to run. A job should not specify the qos (quality of service) into which it should run, allowing the scheduler to route the job according to the resources it requires. Currently, jobs move to either the short, medium or long queue, as follows:

QOS Time Limit Jobs per user Cores per User Cores Available
short 4 hours 60 jobs 60 cores no limit
medium 24 hours 8 jobs 40 cores
4 nodes/user
100 cores
long 15 days 8 jobs 40 cores 80 cores
 

Scratch Space

Scratch space is available in /scratch on every node. Create a directory /scratch/network/username and use this to place temporary files/data. This space is an NFS-mounted shared space of close to 1 TB. Files are NOT backed up so move any important files to long term storage (your home directory, another machine). Also note that these scratch directories will be cleaned nightly to purge files older than 15 days.

Also available is approximately 190 GB of local storage known as /scratch. Since this storage is not shared across all nodes, it is ideally suited for temporary output.

Running Third-party Software

If you are running 3rd-party software whose characteristics (e.g., memory usage) you are unfamiliar with, please check your job after 5-15 minutes using 'top' or 'ps -ef' on the compute nodes being used. If the memory usage is growing rapidly, or close to exceeding the per-processor memory limit, you should terminate your job before it causes the system to hang or crash.