What is a cluster?

A cluster is a group of inter-connected computers that work together to perform computationally intensive tasks.  In a cluster, each computer is referred to as a "node".  (The term "node" comes from graph theory.)

A cluster has a small number of "head nodes", usually one or two, and a large number of "compute nodes". For example, the della cluster has 225 compute nodes.  The head node is the computer to which you log in, and where you edit scripts, compile code, and submit jobs.  Your jobs are automatically run on the compute nodes by the scheduling program "SLURM" -- see: Introducing SLURM.

Each node contains one or more processors or CPUs (usually two) on which computation takes place.  Each processor has multiple "cores".  For example, the newest of the della nodes each contains two processors, and each processor has 16 cores, for a total of 32 cores in each node.  This means that one of these della nodes can perform 32 tasks simultaneously.

A job can be run on a single core.  Assuming the software is written to support parallel operations, a job may also run on multiple cores on one or more processors, and even on multiple nodes at one time.  See: Compiling and Running MPI Jobs.

The nodes in each of our clusters are connected to one-another by high-speed, low-latency networks, either InfiniBand or Omni-Path.  This helps to support parallel operations across nodes.

The tiger cluster has nodes which, in addition to their CPU processors, also have GPUs.  See: What is a GPU?

Our clusters run the Linux operating system.  More specifically, they run Springdale Linux, which is a customized version of Red Hat Enterprise Linux.  Springdale is a Linux distribution maintained by members of the computing staffs of ​Princeton University and the ​Institute for Advanced Study.

The environment that determines when your program will run is called SLURM. The computer on which you’re going to run is actually a collection of computers, called a cluster. Since there are more people who use the cluster than there are computers, the work needs to be scheduled. This scheduling is done by SLURM.

The main one, the one to which you connect when you give the name, is called the head node. This computer’s job is to allow users to compose work for the other computers in the cluster and to give that work to SLURM. Most of the other computers in the cluster are for doing actual work. They’re called compute nodes.

For this to be successful, SLURM needs to know how much of the cluster you need and how long you intend to use that portion. How much you need is described in terms of the number of computers (“nodes”) and the number of threads of execution (“cores”). For the first script, let’s assume that you will only use one.

What is this thing called a “cluster?”

It seems complicated at first but with a little understanding it all makes sense.  Your first exposure will be when you obtain an account and first login to the head node.  Sometimes it is called a login node.  That’s the public entry point to a bunch of other computers which are called nodes or compute nodes.  Sometimes there are only a handful or two of them.  Sometimes there are hundreds of them.  And each contain a number of resources which you can use.

Lets look at one of these nodes.  Inside you will find the actual hardware which consists of typically two sockets into which the actual CPU goes.  The CPU, sometimes called a socket, is connected to the memory as well as having connections to other devices like GPUs or networking cards.  These CPUs contain cores or compute cores.  The actual amount varies with which actual CPU is installed.

Taking a cluster like adroit, there are two CPUs per node and each CPU contains 16 cores.  So that is 32 compute cores per node.  Most times it doesn’t matter which core you are running on as they are all generic.  For this conversation assume that it doesn’t matter.

In order to keep track of who is running where and on which node and core we need a scheduler to allocate these resources.  For all of our clusters today we are using Slurm to do this.  In your job script you define exactly how many cores are required as well as how many nodes if your job can span multiple nodes.  If resources are not available immediately then Slurm will start reserving resources for you and launch your job when they are ready.