Learning resources: Hardware Essentials & Vectorization

How to Use these Resources

Within each category below (Videos, Online courses, etc), the curated resources listed give an approximate progression (with some overlap, since the content is pulled from a potpourri of sources).  First come high-level overviews of essentials of computing architecture that computational scientists and engineers should know, followed by vectorization (instruction-level parallelism) on single core and performance tuning for serial code, followed by multi-core architectures, and ending with a general outlook on the landscape of hardware for high-performance computing.

As the resources progress, they begin to presume some basic familiarity with a compiled language like C, C++, or Fortran in order to follow examples.

Some of the material listed under Overview of HPC & Parallel programming also gives some high-level overview of hardware and its pertinence to computational scientists.


Computer Architecture Essentials -- video lecture (~90 mins) and slides delivered at Argonne National Lab (ANL) during the Argonne Training Program on Extreme-Scale Computing (ATPESC) 2016.

The Tyranny of the Storage Hierarchy (Part 1 and Part 2) -- two videos (~1 hr each) covering a single set of slides that give an overview of the distinctions among registers, cache, RAM, and hard disks, and how they are all related.  From the Supercomputing in Plain English (SIPE) 2018 workshop series from the OU (University of Oklahoma) Supercomputing Center for Education & Research (OSCER).

Vectorization and Performance Tuning -- video lecture (~50 mins) and slides delivered at ATPESC 2016. Overview of different techniques to leverage vectorization (SIMD), common pitfalls, and how to measure performance.

Instruction-Level Parallelism --video (~1 hr) and slides that give an overview of how to get a single CPU core to execute multiple instructions at once.  From SIPE 2018.

Parallel Computing Architectures -- overview of the hierarchy of architectures found in large clusters (single processor architecture, shared-memory systems, distributed memory systems, and accelerators like GPUs).  From the Parallel Programming in Computational Engineering and Science (PPCES) 2014, an HPC workshop held at RWTH Aachen in Germany.

Multi-core Architectures -- video (~50 mins) with slides from ATPESC 2018.  A primer on multi-core architecture and how programmers can leverage it for performance.

Multicore Madness -- video (~1 hr) and slides from SIPE 2018 with a different presentation on multicore architecture. In particular, it gives nice background on why the last ~15 years have seen a move toward multicore architectures and what that means for programmers.

An Introduction to Parallel Supercomputing -- video (~50 mins) with slides from ATPESC 2018. Survey of the current landscape of supercomputing hardware and future outlooks.


Self-paced online courses

Vectorization course -- self-paced online course from the Cornell University Center for Advanced Computing (CAC)

Introduction to Multicore Performance -- a self-paced online course from Cyberinfrastructure Tutor at the National Center for Supercomputing Applications (NCSA)


Web pages / written online tutorials

Serial Tuning Basics -- a video recording of a lecture from PPCES 2014, along with slides.  The video lecture is followed by a hands-on lab session.  There once existed a tarball of files for the hands-on exercises along with instructions for how to use them (note that some references in the exercises are specific to the Aachen computing system used at the workshop, but there should be fairly evident substitutions to make when working on your local laptop or one of the Princeton systems).

Computer Architecture: Concepts & Terminology -- overview of jargon that arises in discussion computer architecture as it relates to HPC.  From Lawrence Livermore National Lab (LLNL).  Part of a longer document offering a comprehensive introduction to parallel computing.

Parallel Computer Memory Architectures -- another chapter from the LLNL online document.



Introduction to High-Performance Scientific Computing -- book by Victor Eijkhout from the Texas Advanced Computing Center (TACC).  Available in online and print versions.  Chapters 1 & 2 give a comprehensive theoretical summary of architectures and programming models underlying serial computing and parallel computing, respectively.