RegistrationRegistration is open to all current Princeton University students, researchers, faculty or staff. A single registration covers all sessions (participants should plan to attend the entire two-day bootcamp).CLICK HERE TO REGISTER by Monday, October 9 to attend the bootcamp. For questions please email [email protected].This two-day bootcamp will provide an introduction to parallel programming for high-performance computing (HPC). Participants will learn about:computer architecture pertinent to programming for HPCcode optimization to take advantage of the vectorized math on modern processorsand parallel programming paradigms for CPUs and GPUs.This workshop is aimed at students and researchers with a fair amount of programming experience, to help them make the transition from running single serial codes on their laptops to running parallel jobs on an HPC cluster. Each session builds on the previous ones so attendees are strongly encouraged to attend all sessions.Most exercises will be conducted in compiled languages. Therefore, prior experience with Linux and C, C++ or Fortran is REQUIRED in order to participate in this workshop.Organized and Sponsored by PICSciE and Research Computing LocationThe bootcamp takes place in 120 Lewis Library. AgendaThe agenda for the 2-day bootcamp is shown below:Day 1: Monday, October 16: Fundamentals & Shared-Memory ParallelismTimeInstructorDay 2: Tuesday, October 17: MPI and GPUsTimeInstructorWelcome and Setup10:00-10:15 AMPICSciE StaffWhat Every Computational Researcher Should Know About Computer Architecture [slides]10:15-11:15 AMStephane EthierPerformance and Vectorization: Part 1 (Hands-on) [slides] [GitHub] [slides by S. Lantz][roofline]11:15 AM-12:00 PMJonathan HalversonLunch Break12:00-12:45 PM Performance and Vectorization: Part 2 (Hands-on) [see slides for part 1]12:45-1:30 PMJonathan HalversonIntroduction to OpenMP (Hands-on) [slides] [GitHub]1:30-3:00 PMStephane EthierBreak3:00-3:15 PM Parallel Python (Hands-on) [see links below]3:15-4:00 PMMattie NiznikIntroduction to MPI (Hands-on) [slides]10:00-11:30 AMStephane EthierMPI for Python (Hands-on) [see links below]11:30 AM-12:00 PMMattie NiznikLunch Break12:00-12:45 PM What is a GPU? [slides]12:45-1:30 PMRohit KakodkarCuPy and Python GPU Libraries (Hands-on) [GitHub]1:30-2:15 PMJonathan HalversonBreak2:15-2:30 PM Introduction to OpenACC (Hands-on) [slides]2:30-3:15 PMStephane EthierIntroduction to Kokkos (Hands-on) [slides] [GitHub]3:15-4:00 PMRohit Kakodkar What Every Computational Researcher Should Know About Computer ArchitectureMonday, October 16 at 10:15-11:15 AMStephane EthierComputational PhysicistPrinceton Plasma Physics Laboratory (PPPL)Description: To demystify the black-box approach to computing, we will start with an overview of computer architectures from a cluster down to microprocessor design. Topics such as vector registers and cache hierarchy will be discussed. Emerging architectures and accelerators such as GPUs will be introduced. Performance metrics such as FLOPs that are frequently used in the HPC community will be defined. Finally, cloud computing and its advantages and disadvantages will be presented.Learning objectives: Attendees will leave with a basic understanding of computer architecture and why awareness of it is important when writing code for high-performance computing.Session format: Presentation. Performance and VectorizationMonday, October 16 at 11:15 AM-1:30 PMJonathan HalversonResearch Software and Computing Training LeadResearch Computing & PICSciE, Princeton UniversityDescription: The past decade has seen a rapid evolution of computing architectures in order to increase performance despite inherent speed limitations that arise from power constraints. One growing trend involves wider vector units, which allow more data elements to be processed simultaneously in a single instruction. To leverage this hardware-level vectorization, programmers need to know how to identify potentially vectorizable loops and how to optimize them for a given processor architecture.This session provides a practical guide on how to make your code run faster on modern processor architecture through vectorization. After a brief introduction to the hardware, we will use Intel Advisor – a powerful profiling tool – to identify and then exploit vectorization opportunities in code. Hands-on examples will allow attendees to gain some familiarity using Advisor in a simple yet realistic setting.Learning objectives: This workshop is geared toward computational researchers looking to leverage performance features of Intel hardware to improve the performance of C/C++ codes. Attendees will leave with a better understanding of the performance-boosting features of different computer architectures and learn techniques for tweaking their codes to take maximum advantage of them.Knowledge prerequisites: Basic Linux, experience with C/C++, and a basic familiarity with the Princeton research computing clusters. Introduction to OpenMPMonday, October 16 at 1:30-3:00 PMStephane EthierComputational PhysicistPrinceton Plasma Physics Laboratory (PPPL)Description: This session uses OpenMP to introduce the fundamental concepts behind parallel programming. Hands-on exercises will explore the common core of OpenMP, in addition to more advanced OpenMP features and fundamental parallel design patterns.Knowledge prerequisites: Participants should be familiar with a compiled programming language like C, C++ or Fortran. Familiarity with the bash command-line is also helpful. Parallel PythonMonday, October 16 at 3:15-4:00 PMMattie NiznikResearch Software & Programming AnalystResearch Computing & PICSciE, Princeton UniversityDescription: Learn about Slurm job arrays, the Python multiprocessing module and the built-in parallelism in the linear algebra routines of NumPy.Knowledge prerequisites: Some experience with Python and Slurm is required.Links:Job Arrayshttps://researchcomputing.princeton.edu/support/knowledge-base/slurm#arrayshttps://github.com/PrincetonUniversity/hpc_beginning_workshop/tree/main/job_array/pythonmultiprocessinghttps://researchcomputing.princeton.edu/support/knowledge-base/python#multiprocessinghttps://github.com/PrincetonUniversity/hpc_beginning_workshop/tree/main/multiprocessingNumpy (linear algebra)https://github.com/PrincetonUniversity/hpc_beginning_workshop/tree/main/python/cpu/numpy Intro to MPI ProgrammingTuesday, October 17 at 10:00-11:30 AMStephane EthierComputational PhysicistPrinceton Plasma Physics Laboratory (PPPL)Description: This session covers the basics of distributed-memory parallel computing with MPI. After introducing environment management, point-to-point communication, and collective communication routines, hands on exercises will reinforce the ideas and provide a few simple examples that can function as building blocks for your future parallel codes.Learning objectives: Participants will learn the essentials of distributed-memory parallel computing using MPI.Knowledge prerequisites: Basic facility with the bash command-line is required (including understanding what environment variables are and how to set their values). Programming experience with C, C++, or Fortran is also required. MPI for PythonTuesday, October 17 at 11:30-12:00 PMStephane EthierComputational PhysicistPrinceton Plasma Physics Laboratory (PPPL)Description: This session will introduce participants to the Python interface to MPI called "MPI for Python" or mpi4py.Learning objectives: Attendees will learn how to write parallel Python code for distributed-memory systems (i.e., multiple nodes).Links:https://researchcomputing.princeton.edu/support/knowledge-base/mpi4pyhttps://github.com/PrincetonUniversity/hpc_beginning_workshop/tree/main/mpi4pyhttps://mpi4py.readthedocs.io/en/stable/intro.html What is a GPU?Tuesday, October 17 at 12:45-1:30 PMRohit KakodkarResearch Software EngineerResearch Computing & Geosciences, Princeton UniversityDescription: This session will provide an overview of the structure and terminology associated with GPU hardware, and specifically NVIDIA GPUs. The sorts of parallel programming paradigms to which GPUs are best suited will also be discussed as well as the math libraries.Learning objectives: Participants will get a high-level overview of what GPUs are, how they work, and what some different approaches are to programming them (later sessions will elaborate on these approaches).Knowledge prerequisites: None. CuPy and Python GPU LibrariesTuesday, October 17 at 1:30-2:15 PMJonathan HalversonResearch Software and Computing Training LeadResearch Computing & PICSciE, Princeton UniversityDescription: This session will introduce CuPy and other libraries as mechanisms to leverage GPUs using Python. Participants will see pragmatic hands-on examples of how the CuPy library can be used to accelerate Python code with a low barrier to entry.Learning objectives: Participants will leave with exposure to different use-cases for CuPy and other Python GPU libraries.Knowledge prerequisites: No previous experience with GPU programming in general is required. However, programming experience with Python is expected. Introduction to OpenACCTuesday, October 17 at 2:30-3:15 PMStephane EthierComputational PhysicistPrinceton Plasma Physics Laboratory (PPPL)Description: This session will give participants a hands-on introduction to OpenACC, a directive-based tool for programming GPUs.Learning objectives: Participants will leave with an overview of how to use OpenACC to accelerate code in a portable way with minimal code changes.Knowledge prerequisites: No previous experience with OpenACC directives or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected. Introduction to KokkosTuesday, October 17 at 3:15-4:00 PMRohit KakodkarResearch Software Engineer IIResearch Computing & Geosciences, Princeton UniversityDescription: This session will give participants a hands-on introduction to Kokkos, a high-level library and programming model for GPUs (and CPUs).Learning objectives: Participants will leave with an overview of how to use Kokkos to accelerate code in a portable way.Knowledge prerequisites: No previous experience with Kokkos or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected. QuestionsFor any questions, or for more information, please email [email protected].