Registration Registration is open to all current Princeton University students, researchers, faculty, or staff. A single registration covers all sessions (participants should plan to attend the entire three-day bootcamp). CLICK HERE TO REGISTER by Monday, October 10 to attend the bootcamp. For questions please email [email protected]. This three-day workshop will provide an introduction to high-performance computing (HPC) on the Princeton University clusters. Participants will learn about aspects of computer architecture pertinent to programming for HPC, learn how to optimize their codes to take advantage of the vectorized math on modern processors, and learn parallel programming tools and paradigms for CPUs and GPUs. This workshop is aimed at researchers with a fair amount of programming experience, to help them make the transition from running single serial codes on their laptops or workstation to running parallel jobs on a cluster. After a high-level overview of parallel programming models, of computer architecture, and of the interplay between the two, the workshop will have sessions on more specialized topics: OpenMP, MPI, and GPU programming. Each session builds on the previous ones and focuses on a different aspect of high-performance computing, and each with its own set of hands-on activities. Attendees are therefore strongly encouraged to attend all sessions. All exercises will be conducted in compiled languages, primarily C and C++. Therefore, prior experience with Linux and with C, C++ (or Fortran) is REQUIRED in order to participate in this workshop. Organized and Sponsored by PICSciE and OIT Research Computing Location The entire bootcamp takes place in 120 Lewis Library. Agenda The agenda for the 3-day bootcamp is shown below: Day 1: Monday, October 17: Background & Fundamentals Time Instructor Welcome and Set-up 10:00-10:15 AM PICSciE Staff What Every Computational Researcher Should Know About Computer Architecture [slides] 10:15-11:15 AM Stephane Ethier Performance and Vectorization: Part 1 (Hands-on) [Slides] [GitHub Repo] 11:15 AM-12:00 PM Abhishek Biswas Lunch Break 12:00-1:00 PM Performance and Vectorization: Part 2 (Hands-on) 1:00-2:00 PM Abhishek Biswas A Primer on Parallel Programming (Hands-on) [Slides] [HPC repo] [R repo] 2:00-3:15 PM Jonathan Halverson Day 2: Tuesday, October 18: OpenMP Time Instructor Introduction to OpenMP (Hands-on) [GitHub Repo] 10:00-11:30 AM Tim Mattson, Intel Lunch Break 11:30 AM-12:30 PM Working with Threads (Hands-on) 12:30-2:00 PM Tim Mattson, Intel Break 2:00-2:15 PM The OpenMP Data Environment (Hands-on) 2:15-3:30 PM Tim Mattson, Intel Break 3:30-3:45 PM Tasks & Advanced Topics (Hands-on) 3:45-4:45 PM Tim Mattson, Intel Day 3: Wednesday, October 19: MPI & GPUs Time Instructor Introduction to MPI (Hands-on) [Slides] 10:00-11:30 AM Stephane Ethier Parallel Python & Parallel R (Hands-on) [mpi4py repo] [Multiprocessing] [NumPy] [HPC R] 11:30 AM-12:00 PM Jonathan Halverson Lunch Break 12:00-1:00 PM What is a GPU? [Slides] 1:00-1:30 PM Stephane Ethier Introduction to CuPy and Numba [Materials] 1:30-2:15 PM Henry Schreiner GPU Libraries (Hands-on) [GitHub repo] 2:15-2:45 PM Jonathan Halverson Break 2:45-3:00 PM Introduction to OpenACC (Hands-on) [Slides] 3:00-4:30 PM Stephane Ethier Introduction to Kokkos [Slides and Code] 4:30-5:00 PM Rohit Kakodkar What Every Computational Researcher Should Know About Computer Architecture Monday, October 17 at 10:15-11:15 AM Stephane Ethier Computational Physicist Princeton Plasma Physics Laboratory (PPPL) Description: To demystify the black-box approach to computing, we will start with an overview of computer architectures from a cluster down to microprocessor design. Topics such as vector registers and cache hierarchy will be discussed. Emerging architectures and accelerators such as GPUs will be introduced. Performance metrics such as FLOPs that are frequently used in the HPC community will be defined. Finally, cloud computing and its advantages and disadvantages will be presented. Learning objectives: Attendees will leave with a basic understanding of computer architecture and why awareness of it is important when writing code for high-performance computing. Session format: Lecture Performance and Vectorization Monday, October 17 at 11:15 AM-2:00 PM Abhishek Biswas Senior Research Software Engineer Research Computing & Molecular Biology, Princeton University Description: The past decade has seen a rapid evolution of computing architectures in order to increase performance despite inherent speed limitations that arise from power constraints. One growing trend involves wider vector units, which allow more data elements to be processed simultaneously in a single instruction. To leverage this hardware-level vectorization, programmers need to know how to identify potentially vectorizable loops and how to optimize them for a given processor architecture. This session provides a practical guide on how to make your code run faster on modern processor architecture through vectorization. After a brief introduction to the hardware, we will use Intel Advisor – a powerful profiling tool – to identify and then exploit vectorization opportunities in code. Hands-on examples will allow attendees to gain some familiarity using Advisor in a simple yet realistic setting. Learning objectives: This workshop is geared toward computational researchers looking to leverage performance features of Intel hardware to improve the performance of C/C++ codes. Attendees will leave with a better understanding of the performance-boosting features of different computer architectures and learn techniques for tweaking their codes to take maximum advantage of them. Knowledge prerequisites: Basic Linux, experience with C/C++, and a basic familiarity with the Princeton research computing clusters. A Primer on Parallel Programming Monday, October 17 at 2:00-3:15 PM Jonathan Halverson Research Software and Computing Training Lead Research Computing & PICSciE, Princeton University Description: This session will provide an high-level overview of different parallel programming paradigms, including ones that will be discussed in detail in later sessions during the bootcamp. Learning objectives: Attendees will leave with a basic understanding of different modalities for parallel computing. Session format: Lecture and Hands-on Introduction to OpenMP Tuesday, October 18 at 10:00 AM-5:00 PM Tim Mattson Senior Principal Engineer Intel Description: This day-long workshop uses OpenMP to introduce the fundamental concepts behind parallel programming. Hands-on exercises will explore the common core of OpenMP, in addition to more advanced OpenMP features and fundamental parallel design patterns. Learning objectives: In addition to hands-on experience using OpenMP, participants will also walk away knowing the basic history of parallel computing, the fundamental concepts behind parallel programming, and some fundamental design patterns from which most parallel algorithms are constructed. Knowledge prerequisites: The tutorial is taught in C. Hence, prior experience with C/C++ is required. We use a simple subset of C (basic control structures, static arrays, simple pointers) that any experienced programmer should know, so you don’t need expert level C/C++. Just the basics should be fine. Familiarity with the bash command-line is also helpful. Hardware/software prerequisites: The exercises can be performed on a modern multi-core laptop. Users who choose to do so will need to have installed a recent version of a C compiler that is OpenMP-aware (e.g. gcc) and be able to access a Linux/Unix command line locally on their laptops. Intro to MPI Programming Wednesday, October 19 at 10:00 AM-11:30 AM Stephane Ethier Computational Physicist Princeton Plasma Physics Laboratory (PPPL) Description: This session covers the basics of distributed-memory parallel computing with MPI. After introducing environment management, point-to-point communication, and collective communication routines, hands on exercises will reinforce the ideas and provide a few simple examples that can function as building blocks for your future parallel codes. Learning objectives: Participants will learn the essentials of distributed-memory parallel computing using MPI. Knowledge prerequisites: Basic facility with the bash command-line is required (including understanding what environment variables are and how to set their values). Programming experience with C, C++, or Fortran is also required. Parallel Python & Parallel R Wednesday, October 19 at 11:30 AM-12:00 PM Jonathan Halverson Research Software and Computing Training Lead Research Computing & PICSciE, Princeton University Description: This session will introduce participants to practical uses for parallelism. This will include Slurm job arrays, MPI for Python, Python multiprocessing and parallel packages and approaches in R. Learning objectives: Attendees will learn about the common approaches used by researchers to parallelise their code. Session format: Demonstration and hands-on What is a GPU? Wednesday, October 19 at 1:00-1:30 PM Stephane Ethier Computational Physicist Princeton Plasma Physics Laboratory (PPPL) Description: This session will provide an overview of the structure and terminology associated with GPU hardware, and specifically NVIDIA GPUs. The sorts of parallel programming paradigms to which GPUs are best suited will also be discussed. After a general overview of GPU programming, attendees will get a hands-on introduction to different GPU programming models: leveraging existing libraries, accelerating Python code on GPUs with CuPy and Numba, an introduction to the OpenACC programming model, and an overview of how to use CUDA to write your own GPU kernels. Learning objectives: Participants will get a high-level overview of what GPUs are, how they work, and what some different approaches are to programming them (later sessions will elaborate on these approaches). Knowledge prerequisites: None. Prior exposure to parallel programming methodologies, though not strictly required, is helpful. Introduction to CuPy and Numba Wednesday, October 19 at 1:30-2:15 PM Henry Schreiner Computational Physicist & Research Software Engineer Research Computing & IRIS-HEP Software Institute, Princeton University Description: This session will introduce CuPy and Numba as mechanisms to leverage GPUs using Python. Participants will see pragmatic hands-on examples of how the CuPy library and Numba package can be used to accelerate Python code with a low barrier to entry. Learning objectives: Participants will leave with exposure to different use-cases for CuPy and Numba. Knowledge prerequisites: No previous experience with GPU programming in general is required. However, programming experience with Python is expected. Prior exposure to parallel programming methodologies, though not strictly required, is also helpful. GPU Libraries Wednesday, October 19 at 2:15-2:45 PM Jonathan Halverson Research Software and Computing Training Lead Research Computing & PICSciE, Princeton University Description: This session will give participants a tour of GPU-ready numerical libraries and demonstrate how to utilize them from their own codes. Learning objectives: Participants will leave with a thorough overview of available libraries, how to call them in your own code, which parallel programming models they support, and caveats about compatibility with different GPU hardware. Knowledge prerequisites: No previous experience with GPU programming in general is required. However, programming experience with C, C++, or Python is expected. Prior exposure to parallel programming methodologies, though not strictly required, is also helpful. Introduction to OpenACC Wednesday, October 19 at 3:00-3:30 PM Stephane Ethier Computational Physicist Princeton Plasma Physics Laboratory (PPPL) Description: This session will give participants a hands-on introduction to OpenACC, a directive-based tool for programming NVIDIA GPUs. Learning objectives: Participants will leave with a thorough overview of how to use OpenACC to accelerate code in a portable way with minimal changes to one’s code. Knowledge prerequisites: No previous experience with OpenACC directives or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected. Prior exposure to parallel programming methodologies, though not strictly required, is also helpful. Introduction to Kokkos Wednesday, October 19 at 4:30-5:00 PM Rohit Kakodkar Research Software Engineer Research Computing & Geosciences, Princeton University Description: This session will give participants a hands-on introduction to Kokkos, a high-level and library and programming model for GPUs. Learning objectives: Participants will leave with a thorough overview of how to use Kokkos to accelerate code in a portable way with minimal changes to one’s code. Knowledge prerequisites: No previous experience with Kokkos or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected. Prior exposure to parallel programming methodologies, though not strictly required, is also helpful. Questions For any questions, or for more information, please email [email protected].