Perseus

Perseus is a 320 node Dell Beowulf cluster. All compute nodes in this cluster are connected via an Infiniband network designed for high speed and low latency to enable excellent performance for tightly coupled MPI codes. Each node has 28-core Broadwell processors and 128 GB of RAM.

System Configuration and Usage

general guidelines

The head node, perseus, should be used for interactive work only, such as compiling programs, and submitting jobs as described below. No jobs should be run on the head node, other than brief tests that last no more than a few minutes. Where practical, we ask that you entirely fill the nodes so that CPU core fragmentation is minimized. For this cluster, perseus, that means multiples of 28 cores.

Please remember that these are shared resources for all users.

Maintenance Window

Perseus will be down for maintenance on the second Tuesday of every month.

Hardware Configuration

  Processor Speed Nodes Cores per Node Memory per Node Total Cores Interconnect Performance: Theoretical
  2.4 GHz Xeon 320 28 128 GB 8960 FDR Infiniband 344 TFLOP

Job Scheduling (QOS parameters)

All jobs must be run through the Slurm scheduler on Perseus. If a job would exceed any of the limits below, it will be held until it is eligible to run. Jobs should not specify the qos into which it should run, allowing the Slurm scheduler to distribute the jobs accordingly.

Jobs will be assigned a quality of service (QOS) according to the length of time specified for their job.  Jobs requesting a single node will be scheduled into the serial partition. There are a limited number of nodes for this purpose (currently 64). These nodes are shared with jobs using multiple nodes. When requesting multiple nodes to run a job, remember that there are 28 cores/node. Again, please fill nodes when running parallel jobs.

QOS Time Limit Jobs per user Cores per User Cores Available
test 1 hour 2 jobs [30 nodes] 360 cores
short 24 hours 40 jobs 128 cores no limit
medium 72 hours 16 jobs 128 cores 432 cores
vlong 168 hours
(7 days)
10 jobs 160 cores 400 cores

Jobs are further prioritized through the Slurm scheduler based on a number of factors: job size, run times, node availability, wait times, and percentage of usage over a 30 day period (fairshare). Also, these values reflect the minimum limits in effect and the actual values may be higher. Please use the "qos" command to see the limits in effect at the current time.

Recommended File System Usage (/home, /scratch, /tigress)

/home (shared via NFS to all the compute nodes) is intended for scripts, source code, executables and small static data sets that may be needed as standard input/configuration for codes.

/scratch/gpfs (shared via GPFS to all the compute nodes, 260 TB) is intended for dynamic data that requires higher bandwidth i/o. Files are NOT backed up so this data should be moved to persistent storage as soon as it is no longer needed for computations. Any files left here will be removed after 180 days.

/tigress (shared via GPFS to all TIGRESS resources, 2.5 PB) is intended for more persistent storage and should provide high bandwidth i/o (10 GB/s aggregate bandwidth for jobs across 16 or more nodes). Users are provided with a default quota of 512 GB when they request a directory in this storage, and that default can be increased by requesting more. We do ask people to consider what they really need, and to make sure they regularly clean out data that is no longer needed since this filesystem is shared by the users of all our systems.

/scratch (local to each compute node) is intended for data local to each task of a job, and it should be cleaned out at the end of each job.

Running Third-party Software

If you are running 3rd-party software whose characteristics (e.g., memory usage) you are unfamiliar with, please check your job after 5-15 minutes using 'top' or 'ps -ef' on the compute nodes being used. If the memory usage is growing rapidly, or close to exceeding the per-processor memory limit, you should terminate your job before it causes the system to hang or crash. You can determine on which node(s) your job is running using the "scontrol show job <jobnumber>" command.