Registration

Registration is open to all current Princeton University students, researchers, faculty or staff.  A single registration covers all sessions (participants should plan to attend the entire two-day bootcamp).

CLICK HERE TO REGISTER by Monday, October 9 to attend the bootcamp. For questions please email [email protected].

This two-day bootcamp will provide an introduction to parallel programming for high-performance computing (HPC). Participants will learn about:

  • computer architecture pertinent to programming for HPC
  • code optimization to take advantage of the vectorized math on modern processors
  • and parallel programming paradigms for CPUs and GPUs.

This workshop is aimed at students and researchers with a fair amount of programming experience, to help them make the transition from running single serial codes on their laptops to running parallel jobs on an HPC cluster. Each session builds on the previous ones so attendees are strongly encouraged to attend all sessions.

Most exercises will be conducted in compiled languages. Therefore, prior experience with Linux and C, C++ or Fortran is REQUIRED in order to participate in this workshop.

Organized and Sponsored by PICSciE and Research Computing

 

Location

The bootcamp takes place in 120 Lewis Library.

 

Agenda

The agenda for the 2-day bootcamp is shown below:

Day 1: Monday, October 16: Fundamentals & Shared-Memory ParallelismTimeInstructor
Day 2: Tuesday, October 17: MPI and GPUsTimeInstructor
Welcome and Setup10:00-10:15 AMPICSciE Staff
What Every Computational Researcher Should Know About Computer Architecture [slides]10:15-11:15 AMStephane Ethier
Performance and Vectorization: Part 1 (Hands-on) [slides] [GitHub] [slides by S. Lantz][roofline]11:15 AM-12:00 PMJonathan Halverson
Lunch Break12:00-12:45 PM 
Performance and Vectorization: Part 2 (Hands-on) [see slides for part 1]12:45-1:30 PMJonathan Halverson
Introduction to OpenMP (Hands-on) [slides] [GitHub]1:30-3:00 PMStephane Ethier
Break3:00-3:15 PM 
Parallel Python (Hands-on) [see links below]3:15-4:00 PMMattie Niznik
Introduction to MPI (Hands-on) [slides]10:00-11:30 AMStephane Ethier
MPI for Python (Hands-on) [see links below]11:30 AM-12:00 PMMattie Niznik
Lunch Break12:00-12:45 PM 
What is a GPU? [slides]12:45-1:30 PMRohit Kakodkar
CuPy and Python GPU Libraries (Hands-on) [GitHub]1:30-2:15 PMJonathan Halverson
Break2:15-2:30 PM 
Introduction to OpenACC (Hands-on) [slides]2:30-3:15 PMStephane Ethier
Introduction to Kokkos (Hands-on) [slides] [GitHub]3:15-4:00 PMRohit Kakodkar

 

 

What Every Computational Researcher Should Know About Computer Architecture

Monday, October 16 at 10:15-11:15 AM

Stephane Ethier
Computational Physicist
Princeton Plasma Physics Laboratory (PPPL)

Description: To demystify the black-box approach to computing, we will start with an overview of computer architectures from a cluster down to microprocessor design. Topics such as vector registers and cache hierarchy will be discussed. Emerging architectures and accelerators such as GPUs will be introduced. Performance metrics such as FLOPs that are frequently used in the HPC community will be defined. Finally, cloud computing and its advantages and disadvantages will be presented.

Learning objectives: Attendees will leave with a basic understanding of computer architecture and why awareness of it is important when writing code for high-performance computing.

Session format: Presentation.

 

Performance and Vectorization

Monday, October 16 at 11:15 AM-1:30 PM

Jonathan Halverson
Research Software and Computing Training Lead
Research Computing & PICSciE, Princeton University

Description: The past decade has seen a rapid evolution of computing architectures in order to increase performance despite inherent speed limitations that arise from power constraints. One growing trend involves wider vector units, which allow more data elements to be processed simultaneously in a single instruction. To leverage this hardware-level vectorization, programmers need to know how to identify potentially vectorizable loops and how to optimize them for a given processor architecture.

This session provides a practical guide on how to make your code run faster on modern processor architecture through vectorization. After a brief introduction to the hardware, we will use Intel Advisor – a powerful profiling tool – to identify and then exploit vectorization opportunities in code. Hands-on examples will allow attendees to gain some familiarity using Advisor in a simple yet realistic setting.

Learning objectives: This workshop is geared toward computational researchers looking to leverage performance features of Intel hardware to improve the performance of C/C++ codes. Attendees will leave with a better understanding of the performance-boosting features of different computer architectures and learn techniques for tweaking their codes to take maximum advantage of them.

Knowledge prerequisites: Basic Linux, experience with C/C++, and a basic familiarity with the Princeton research computing clusters.

 

Introduction to OpenMP

Monday, October 16 at 1:30-3:00 PM

Stephane Ethier
Computational Physicist
Princeton Plasma Physics Laboratory (PPPL)

Description: This session uses OpenMP to introduce the fundamental concepts behind parallel programming. Hands-on exercises will explore the common core of OpenMP, in addition to more advanced OpenMP features and fundamental parallel design patterns.

Knowledge prerequisites: Participants should be familiar with a compiled programming language like C, C++ or Fortran. Familiarity with the bash command-line is also helpful.

 

Parallel Python

Monday, October 16 at 3:15-4:00 PM

Mattie Niznik
Research Software & Programming Analyst
Research Computing & PICSciE, Princeton University

Description: Learn about Slurm job arrays, the Python multiprocessing module and the built-in parallelism in the linear algebra routines of NumPy.

Knowledge prerequisites: Some experience with Python and Slurm is required.

Links:

 

Intro to MPI Programming

Tuesday, October 17 at 10:00-11:30 AM

Stephane Ethier
Computational Physicist
Princeton Plasma Physics Laboratory (PPPL)

Description: This session covers the basics of distributed-memory parallel computing with MPI. After introducing environment management, point-to-point communication, and collective communication routines, hands on exercises will reinforce the ideas and provide a few simple examples that can function as building blocks for your future parallel codes.

Learning objectives: Participants will learn the essentials of distributed-memory parallel computing using MPI.

Knowledge prerequisites: Basic facility with the bash command-line is required (including understanding what environment variables are and how to set their values). Programming experience with C, C++, or Fortran is also required.

 

MPI for Python

Tuesday, October 17 at 11:30-12:00 PM

Stephane Ethier
Computational Physicist
Princeton Plasma Physics Laboratory (PPPL)

Description: This session will introduce participants to the Python interface to MPI called "MPI for Python" or mpi4py.

Learning objectives: Attendees will learn how to write parallel Python code for distributed-memory systems (i.e., multiple nodes).

Links:

 

What is a GPU?

Tuesday, October 17 at 12:45-1:30 PM

Rohit Kakodkar
Research Software Engineer
Research Computing & Geosciences, Princeton University

Description: This session will provide an overview of the structure and terminology associated with GPU hardware, and specifically NVIDIA GPUs. The sorts of parallel programming paradigms to which GPUs are best suited will also be discussed as well as the math libraries.

Learning objectives: Participants will get a high-level overview of what GPUs are, how they work, and what some different approaches are to programming them (later sessions will elaborate on these approaches).

Knowledge prerequisites: None.

 

CuPy and Python GPU Libraries

Tuesday, October 17 at 1:30-2:15 PM

Jonathan Halverson
Research Software and Computing Training Lead
Research Computing & PICSciE, Princeton University

Description: This session will introduce CuPy and other libraries as mechanisms to leverage GPUs using Python. Participants will see pragmatic hands-on examples of how the CuPy library can be used to accelerate Python code with a low barrier to entry.

Learning objectives: Participants will leave with exposure to different use-cases for CuPy and other Python GPU libraries.

Knowledge prerequisites: No previous experience with GPU programming in general is required. However, programming experience with Python is expected.

 

Introduction to OpenACC

Tuesday, October 17 at 2:30-3:15 PM

Stephane Ethier
Computational Physicist
Princeton Plasma Physics Laboratory (PPPL)

Description: This session will give participants a hands-on introduction to OpenACC, a directive-based tool for programming GPUs.

Learning objectives: Participants will leave with an overview of how to use OpenACC to accelerate code in a portable way with minimal code changes.

Knowledge prerequisites: No previous experience with OpenACC directives or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected.

 

Introduction to Kokkos

Tuesday, October 17 at 3:15-4:00 PM

Rohit Kakodkar
Research Software Engineer II
Research Computing & Geosciences, Princeton University

Description: This session will give participants a hands-on introduction to Kokkos, a high-level library and programming model for GPUs (and CPUs).

Learning objectives: Participants will leave with an overview of how to use Kokkos to accelerate code in a portable way.

Knowledge prerequisites: No previous experience with Kokkos or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected.

 

Questions

For any questions, or for more information, please email [email protected].