How to Use these Resources
Within each category below (Videos, Online courses, etc), the curated resources listed give an approximate progression (with some overlap, since the content is pulled from a potpourri of sources). First come high-level overviews of essentials of computing architecture that computational scientists and engineers should know, followed by vectorization (instruction-level parallelism) on single core and performance tuning for serial code, followed by multi-core architectures, and ending with a general outlook on the landscape of hardware for high-performance computing.
As the resources progress, they begin to presume some basic familiarity with a compiled language like C, C++, or Fortran in order to follow examples.
Some of the material listed under Overview of HPC & Parallel programming also gives some high-level overview of hardware and its pertinence to computational scientists.
The Tyranny of the Storage Hierarchy (Part 1 and Part 2) -- two videos (~1 hr each) covering a single set of slides that give an overview of the distinctions among registers, cache, RAM, and hard disks, and how they are all related. From the Supercomputing in Plain English (SIPE) 2018 workshop series from the OU (University of Oklahoma) Supercomputing Center for Education & Research (OSCER).
Vectorization and Performance Tuning -- video lecture (~50 mins) and slides delivered at ATPESC 2016. Overview of different techniques to leverage vectorization (SIMD), common pitfalls, and how to measure performance.
Parallel Computing Architectures -- overview of the hierarchy of architectures found in large clusters (single processor architecture, shared-memory systems, distributed memory systems, and accelerators like GPUs). From the Parallel Programming in Computational Engineering and Science (PPCES) 2014, an HPC workshop held at RWTH Aachen in Germany.
Multicore Madness -- video (~1 hr) and slides from SIPE 2018 with a different presentation on multicore architecture. In particular, it gives nice background on why the last ~15 years have seen a move toward multicore architectures and what that means for programmers.
Self-paced online courses
Vectorization course -- self-paced online course from the Cornell University Center for Advanced Computing (CAC)
Web pages / written online tutorials
Serial Tuning Basics -- a video recording of a lecture from PPCES 2014, along with slides. The video lecture is followed by a hands-on lab session. There once existed a tarball of files for the hands-on exercises along with instructions for how to use them (note that some references in the exercises are specific to the Aachen computing system used at the workshop, but there should be fairly evident substitutions to make when working on your local laptop or one of the Princeton systems).
Computer Architecture: Concepts & Terminology -- overview of jargon that arises in discussion computer architecture as it relates to HPC. From Lawrence Livermore National Lab (LLNL). Part of a longer document offering a comprehensive introduction to parallel computing.
Parallel Computer Memory Architectures -- another chapter from the LLNL online document.
Introduction to High-Performance Scientific Computing -- book by Victor Eijkhout from the Texas Advanced Computing Center (TACC). Available in online and print versions. Chapters 1 & 2 give a comprehensive theoretical summary of architectures and programming models underlying serial computing and parallel computing, respectively.