This is a partial list of common research computing terms:

Terms

Definitions

Bash

This is a shell software that enables user interaction with the operating system.  This is the default CLI interface on most Linux systems.  It enables CLI interactivity and running programmed scripts to automate commands.

Cluster

A cluster is a group of computers, connected by a network to function as if they were one, large computer with a lot of resources

Command line

A typed command initiating an operation or program from an operating system's command prompt. 

Compiler

*A program that translates programs written in a high-level language like C or Fortran into a lower-level form like assembly language.

Compute Node

A computer that is used for running computations, sometimes managed by a scheduler.

Core

A shorthand way to refer to the number of processor cores (usually physical) of a CPU in a node.

CPU

Stands for Central Processing Unit, and also known as the computer’s processor

Directory

A folder.

Embarrassingly (Pleasingly) Parallel

A group of operations that a independent of each other that can be performed in parallel with no restrictions on ordering. If there are N pleasingly parallel operations each taking 1 unit of time and N processors are available, then they can be done in parallel in 1 unit of time. 

Executable

An executable is a file that can be typed in a shell and run (executed) the commands within the file.  The executable can be a script file with readable text in the language as written or a compiled binary with the encoded program.

GPU

The graphics processing unit (GPU) is the computer chip that processes the data to be shown on a display.  They are capable of processing large blocks of data in parallel which makes them different from the CPU.  The GPU is now also used for general computing of algorithms which can process input in parallel on the GPU.

Head Node

The main computer that connects a cluster of computers to an outside network. On Princeton’s clusters only the head node has access to the internet.

Linux

*An open-source Unix-like operating system. An alternative to, for example, Microsoft’s Windows or Apple’s OSX

Local machine

One typically connects to an HPC cluster using SSH from a laptop or desktop. In this scenario the laptop or desktop is the local machine and the login node of the HPC cluster is the remote machine.

Memory

Computers store data in two places: Random Access Memory (RAM) and hard drives. Reads and writes to RAM are much faster than to a file on a hard drive so computer programs store their data in RAM as much as possible. The amount of space needed to store data in RAM is called memory. For instance, a program that stores 1 billion floating point numbers will need 8 GB of memory since each number requires 8 bytes. You will need to request a certain amout of memory in your Slurm scripts when running batch jobs on the HPC clusters.

Module

The HPC clusters have lots of pre-installed software. Each piece of software requires a different configuration so it is not possible to have all of it available at once. Instead, environment modules are used which allow the user to specify which software to load. You can see the available modules by running the "module avail" command. To activate specific software see the "module load" command on the Environment Modules page.

Node

Synonym for a computer or server. Within the cluster a node is one of the standalone computers. Each node on the Princeton Research Computing clusters have at least two CPUs and each CPU is composed of multiple cores. A serial job will run on one of the cores within one of CPUs of a node.

Parallel Job

A parallel job is one that can use more than one CPU-core simultaneously. This encompasses multithreaded codes written using OpenMP as well as multiprocess jobs written using the message passing interface (MPI). The difference between a job composed on parallel processes and concurrent processes is that the parallel processes must run simultaneously while the concurrent process do not (but may).

RAM

*Stands for Random Access Memory. The primary memory in a computer. See Memory above.

Remote machine

In the high-performance computing (HPC) world one typically connects to a the login node an HPC cluster using SSH from a laptop or desktop machine. In this scenario the login node of the cluster is the remote machine and the user's laptop or desktop is the local machine. An example of the remote machine is adroit.princeton.edu.

Serial Job

A serial job is composed of a set of instructions that must be executed serially or in sequence. Serial jobs can only use one CPU-core.

Server

*Computer or computers that provide access to data upon request from a client

Shell

The shell is the program that allows you to run commands on the command line or execute commands in a script. When you first connect to the login node of an HPC cluster you land on the shell. There are different shells such as bash, tcsh and zsh. Each shell uses a different syntax.

Slurm

A system that helps to manage and schedule jobs on a cluster. Used by the larger Princeton clusters. Stands for “Simple Linux User Resource Management.”

Source Code

*A program written in a language comprehensible by programmers, to be compiled into code in binary form (0’s and 1’s).

ssh

Secure shell (ssh) is a secure protocol for running commands on a remote machine.

Task

A task is a generic term used to describe a set of computational work to be done. It typically means process.

Thread

This is a sequence of instructions that are scheduled by the operating system for processing.

Hardware

The actual computer and associated components. The parts that can be touched.

Software

Computer programs which run on the hardware.

Processor

The component of a computer that does arithmetic and logic, and also controls the rest of the computer.

Shared memory parallelization

A parallel program where each of the processes (or threads) have access to a shared memory space in addition to their own private memory.

Distributed memory parallelization

When a program has to run multiple processes that are not independent of each other; the different processes need to communicate with each other.

Array job 

A job that runs various copies of the same code, but with different inputs.

* copied from Understanding the Digital World  by Brian W. Kernighan