Registration

Registration is open to all current Princeton University students, researchers, faculty and staff.  A single registration covers all sessions.

CLICK HERE TO REGISTER by Wednesday, October 9 to attend the bootcamp. For questions please email [email protected].

This two-day bootcamp will provide an introduction to parallel programming for high-performance computing (HPC). Participants will learn about:

  • computer architecture pertinent to programming for HPC
  • code optimization to take advantage of the vectorized math on modern processors
  • and parallel programming paradigms for CPUs and GPUs.

This event is designed for students and researchers with a fair amount of programming experience that are looking to make the transition from running single serial codes on their laptops to running parallel jobs on an HPC cluster. Each session builds on the previous ones so attendees are strongly encouraged to attend all sessions.

Most exercises will be conducted in compiled languages. Therefore, prior experience with Linux and C, C++ or Fortran is REQUIRED in order to participate in this bootcamp.

Organized and Sponsored by PICSciE and Research Computing

 

Location

This in-person bootcamp will take place in 138 Lewis Library.

 

Agenda

The agenda for the 2-day bootcamp is shown below:

Day 1: Monday, October 14: Fundamentals & Shared-Memory ParallelismTimeInstructor
Welcome and Setup10:00-10:15 AMPICSciE Staff
What Every Computational Researcher Should Know About Computer Architecture10:15-11:15 AMStephane Ethier
Performance and Vectorization: Part 1 (Hands-on)11:15 AM-12:00 PMJonathan Halverson
Lunch Break12:00-12:45 PM 
Performance and Vectorization: Part 2 (Hands-on)12:45-1:30 PMJonathan Halverson
Introduction to OpenMP (Hands-on)1:30-2:45 PMStephane Ethier
Break2:45-3:00 PM 
Introduction to Parallel Python (Hands-on)3:00-4:00 PMJonathan Halverson
Day 2: Tuesday, October 15: Distributed-Memory Parallelism and GPUsTimeInstructor
Introduction to Parallel Programming with MPI (Hands-on)10:00-11:30 AMJonathan Gorard
MPI for Python (Hands-on)11:30 AM-12:00 PMMattie Niznik
Lunch Break12:00-12:45 PM 
What is a GPU?12:45-1:30 PMRohit Kakodkar
Python on GPUs: CuPy and Numba (Hands-on)1:30-2:15 PMHenry Schreiner
Break2:15-2:30 PM 
Introduction to Directive-Based Programming Models for GPUs2:30-3:15 PMStephane Ethier
Introduction to Kokkos (Hands-on)3:15-4:00 PMRohit Kakodkar

 

What Every Computational Researcher Should Know About Computer Architecture

Monday, October 14 at 10:15-11:15 AM

Stephane Ethier
Computational Physicist
Princeton Plasma Physics Laboratory (PPPL)

Materials: slides

Description: To demystify the black-box approach to computing, we will start with an overview of computer architectures from a cluster down to microprocessor design. Topics such as vector registers and cache hierarchy will be discussed. Emerging architectures and accelerators such as GPUs will be introduced. Performance metrics such as FLOPs that are frequently used in the HPC community will be defined. Finally, cloud computing and its advantages and disadvantages will be presented.

Learning objectives: Attendees will leave with a basic understanding of computer architecture and why awareness of it is important when writing code for high-performance computing.

Session format: Presentation.

 

Performance and Vectorization

Monday, October 14 at 11:15 AM-1:30 PM

Jonathan Halverson
Research Software and Computing Training Lead
PICSciE, Princeton University

Materials: Slides (Bei Wang), Slides (Steve Lantz), Slides and Exercises, Video (roofline analysis), Python profiling, MAP profiler

Description: The past decade has seen a rapid evolution of computing architectures in order to increase performance despite inherent speed limitations that arise from power constraints. One growing trend involves wider vector units, which allow more data elements to be processed simultaneously in a single instruction. To leverage this hardware-level vectorization, programmers need to know how to identify potentially vectorizable loops and how to optimize them for a given processor architecture.

This session provides a practical guide on how to make your code run faster on modern processor architecture through vectorization. After a brief introduction to the hardware, we will use Intel Advisor – a powerful profiling tool – to identify and then exploit vectorization opportunities in code. Hands-on examples will allow attendees to gain some familiarity using Advisor in a simple yet realistic setting.

Learning objectives: This workshop is geared toward computational researchers looking to leverage performance features of Intel hardware to improve the performance of C/C++ codes. Attendees will leave with a better understanding of the performance-boosting features of different computer architectures and learn techniques for tweaking their codes to take maximum advantage of them.

Knowledge prerequisites: Basic Linux, experience with C/C++, and a basic familiarity with the Princeton research computing clusters.

 

Introduction to OpenMP

Monday, October 14 at 1:30-2:45 PM

Stephane Ethier
Computational Physicist
Princeton Plasma Physics Laboratory (PPPL)

Materials: slides, exercises and how to compile

Description: This session uses OpenMP to introduce the fundamental concepts behind parallel programming. Hands-on exercises will explore the common core of OpenMP, in addition to more advanced OpenMP features and fundamental parallel design patterns.

Knowledge prerequisites: Participants should be familiar with a compiled programming language like C, C++ or Fortran. Familiarity with the bash command-line is also helpful.

 

Introduction to Parallel Python

Monday, October 14 at 3:00-4:00 PM

Jonathan Halverson
Research Software and Computing Training Lead
PICSciE, Princeton University

Links:

Description: Learn about Slurm job arrays, the Python multiprocessing module and the built-in parallelism in the linear algebra routines of NumPy. Other approaches and libraries will be mentioned.

Knowledge prerequisites: Some experience with Python and Slurm is required.

 

Introduction to Parallel Programming with MPI

Tuesday, October 15 at 10:00-11:30 AM

Jonathan Gorard
Research Software Engineer II
Research Computing and Princeton Plasma Physics Laboratory

Materials: slides and exercises, download exercises

Description: This session covers the basics of distributed-memory parallel computing with the Message Passing Interface (MPI). After introducing environment management, point-to-point communication, and collective communication routines, hands on exercises will reinforce the ideas and provide a few simple examples that can function as building blocks for your future parallel codes.

Learning objectives: Participants will learn the essentials of distributed-memory parallel computing using MPI.

Knowledge prerequisites: Basic facility with the bash command-line is required (including understanding what environment variables are and how to set their values). Programming experience with C, C++, or Fortran is also required.

 

MPI for Python

Tuesday, October 15 at 11:30-12:00 PM

Mattie Niznik
Research Software & Programming Analyst
Princeton Institute for Computational Science and Engineering (PICSciE), Princeton University

Materials: KB page, exercises

Description: This session will introduce participants to the Python interface to MPI called "MPI for Python" or mpi4py.

Learning objectives: Attendees will learn how to write parallel Python code for distributed-memory systems (i.e., multiple nodes).

 

What is a GPU?

Tuesday, October 15 at 12:45-1:30 PM

Rohit Kakodkar
Research Software Engineer
Research Computing and Geosciences, Princeton University

Materials: slides

Description: This session will provide an overview of the structure and terminology associated with GPU hardware, and specifically NVIDIA GPUs. The sorts of parallel programming paradigms to which GPUs are best suited will also be discussed as well as the math libraries.

Learning objectives: Participants will get a high-level overview of what GPUs are, how they work, and what some different approaches are to programming them (later sessions will elaborate on these approaches).

Knowledge prerequisites: None.

 

Python for GPUs: CuPy and Numba

Tuesday, October 15 at 1:30-2:15 PM

Henry Schreiner
Computational Physicist and Lecturer
PICSciE and PACM, Princeton University

Materials: GitHub repo, iscinumpy.dev

Description: This session will introduce CuPy and other libraries as mechanisms to leverage GPUs using Python. Participants will see pragmatic hands-on examples of how the CuPy library can be used to accelerate Python code with a low barrier to entry. The Numba compiler for Python will also be demonstrated.

Learning objectives: Participants will leave with exposure to different use-cases for CuPy and Numba.

Knowledge prerequisites: No previous experience with GPU programming in general is required. However, programming experience with Python is expected.

 

Introduction to Directive-Based Programming Models for GPUs

Tuesday, October 15 at 2:30-3:15 PM

Stephane Ethier
Computational Physicist
Princeton Plasma Physics Laboratory (PPPL)

Materials: slides, exercises

Description: This session will give participants an introduction to OpenACC and OpenMP, which are directive-based approaches for programming GPUs.

Learning objectives: Participants will leave with an overview of how to accelerate code in a portable way with minimal code changes.

Knowledge prerequisites: No previous experience with OpenACC directives or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected.

 

Introduction to Kokkos

Tuesday, October 15 at 3:15-4:00 PM

Rohit Kakodkar
Research Software Engineer II
Research Computing and Geosciences, Princeton University

Materials: GitHub repo

Description: This session will give participants a hands-on introduction to Kokkos, a high-level library and programming model for GPUs (and CPUs).

Learning objectives: Participants will leave with an overview of how to use Kokkos to accelerate code in a portable way.

Knowledge prerequisites: No previous experience with Kokkos or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected.

 

Questions

For any questions, or for more information, please email [email protected].