RegistrationRegistration is open to all current Princeton University students, researchers, faculty and staff. A single registration covers all sessions.CLICK HERE TO REGISTER by Wednesday, October 9 to attend the bootcamp. For questions please email [email protected].This two-day bootcamp will provide an introduction to parallel programming for high-performance computing (HPC). Participants will learn about:computer architecture pertinent to programming for HPCcode optimization to take advantage of the vectorized math on modern processorsand parallel programming paradigms for CPUs and GPUs.This event is designed for students and researchers with a fair amount of programming experience that are looking to make the transition from running single serial codes on their laptops to running parallel jobs on an HPC cluster. Each session builds on the previous ones so attendees are strongly encouraged to attend all sessions.Most exercises will be conducted in compiled languages. Therefore, prior experience with Linux and C, C++ or Fortran is REQUIRED in order to participate in this bootcamp.Organized and Sponsored by PICSciE and Research Computing LocationThis in-person bootcamp will take place in 138 Lewis Library. AgendaThe agenda for the 2-day bootcamp is shown below:Day 1: Monday, October 14: Fundamentals & Shared-Memory ParallelismTimeInstructorWelcome and Setup10:00-10:15 AMPICSciE StaffWhat Every Computational Researcher Should Know About Computer Architecture10:15-11:15 AMStephane EthierPerformance and Vectorization: Part 1 (Hands-on)11:15 AM-12:00 PMJonathan HalversonLunch Break12:00-12:45 PM Performance and Vectorization: Part 2 (Hands-on)12:45-1:30 PMJonathan HalversonIntroduction to OpenMP (Hands-on)1:30-2:45 PMStephane EthierBreak2:45-3:00 PM Introduction to Parallel Python (Hands-on)3:00-4:00 PMJonathan HalversonDay 2: Tuesday, October 15: Distributed-Memory Parallelism and GPUsTimeInstructorIntroduction to Parallel Programming with MPI (Hands-on)10:00-11:30 AMJonathan GorardMPI for Python (Hands-on)11:30 AM-12:00 PMMattie NiznikLunch Break12:00-12:45 PM What is a GPU?12:45-1:30 PMRohit KakodkarPython on GPUs: CuPy and Numba (Hands-on)1:30-2:15 PMHenry SchreinerBreak2:15-2:30 PM Introduction to Directive-Based Programming Models for GPUs2:30-3:15 PMStephane EthierIntroduction to Kokkos (Hands-on)3:15-4:00 PMRohit Kakodkar What Every Computational Researcher Should Know About Computer ArchitectureMonday, October 14 at 10:15-11:15 AMStephane EthierComputational PhysicistPrinceton Plasma Physics Laboratory (PPPL)Materials: slidesDescription: To demystify the black-box approach to computing, we will start with an overview of computer architectures from a cluster down to microprocessor design. Topics such as vector registers and cache hierarchy will be discussed. Emerging architectures and accelerators such as GPUs will be introduced. Performance metrics such as FLOPs that are frequently used in the HPC community will be defined. Finally, cloud computing and its advantages and disadvantages will be presented.Learning objectives: Attendees will leave with a basic understanding of computer architecture and why awareness of it is important when writing code for high-performance computing.Session format: Presentation. Performance and VectorizationMonday, October 14 at 11:15 AM-1:30 PMJonathan HalversonResearch Software and Computing Training LeadPICSciE, Princeton UniversityMaterials: Slides (Bei Wang), Slides (Steve Lantz), Slides and Exercises, Video (roofline analysis), Python profiling, MAP profilerDescription: The past decade has seen a rapid evolution of computing architectures in order to increase performance despite inherent speed limitations that arise from power constraints. One growing trend involves wider vector units, which allow more data elements to be processed simultaneously in a single instruction. To leverage this hardware-level vectorization, programmers need to know how to identify potentially vectorizable loops and how to optimize them for a given processor architecture.This session provides a practical guide on how to make your code run faster on modern processor architecture through vectorization. After a brief introduction to the hardware, we will use Intel Advisor – a powerful profiling tool – to identify and then exploit vectorization opportunities in code. Hands-on examples will allow attendees to gain some familiarity using Advisor in a simple yet realistic setting.Learning objectives: This workshop is geared toward computational researchers looking to leverage performance features of Intel hardware to improve the performance of C/C++ codes. Attendees will leave with a better understanding of the performance-boosting features of different computer architectures and learn techniques for tweaking their codes to take maximum advantage of them.Knowledge prerequisites: Basic Linux, experience with C/C++, and a basic familiarity with the Princeton research computing clusters. Introduction to OpenMPMonday, October 14 at 1:30-2:45 PMStephane EthierComputational PhysicistPrinceton Plasma Physics Laboratory (PPPL)Materials: slides, exercises and how to compileDescription: This session uses OpenMP to introduce the fundamental concepts behind parallel programming. Hands-on exercises will explore the common core of OpenMP, in addition to more advanced OpenMP features and fundamental parallel design patterns.Knowledge prerequisites: Participants should be familiar with a compiled programming language like C, C++ or Fortran. Familiarity with the bash command-line is also helpful. Introduction to Parallel PythonMonday, October 14 at 3:00-4:00 PMJonathan HalversonResearch Software and Computing Training LeadPICSciE, Princeton UniversityLinks:Job Arrayshttps://researchcomputing.princeton.edu/support/knowledge-base/slurm#arrayshttps://github.com/PrincetonUniversity/hpc_beginning_workshop/tree/main/job_array/python(Link is external)multiprocessinghttps://researchcomputing.princeton.edu/support/knowledge-base/python#multiprocessinghttps://github.com/PrincetonUniversity/hpc_beginning_workshop/tree/main/multiprocessing(Link is external)Numpy (linear algebra)https://github.com/PrincetonUniversity/hpc_beginning_workshop/tree/main/python/cpu/numpyray.iohttps://docs.ray.io/en/latest/ray-core/examples/highly_parallel.htmldaskhttps://docs.dask.org/en/stable/Python 3.13 (free threading)Slides by Henry SchreinerOpenMP for PythonDescription: Learn about Slurm job arrays, the Python multiprocessing module and the built-in parallelism in the linear algebra routines of NumPy. Other approaches and libraries will be mentioned.Knowledge prerequisites: Some experience with Python and Slurm is required. Introduction to Parallel Programming with MPITuesday, October 15 at 10:00-11:30 AMJonathan GorardResearch Software Engineer IIResearch Computing and Princeton Plasma Physics LaboratoryMaterials: slides and exercises, download exercisesDescription: This session covers the basics of distributed-memory parallel computing with the Message Passing Interface (MPI). After introducing environment management, point-to-point communication, and collective communication routines, hands on exercises will reinforce the ideas and provide a few simple examples that can function as building blocks for your future parallel codes.Learning objectives: Participants will learn the essentials of distributed-memory parallel computing using MPI.Knowledge prerequisites: Basic facility with the bash command-line is required (including understanding what environment variables are and how to set their values). Programming experience with C, C++, or Fortran is also required. MPI for PythonTuesday, October 15 at 11:30-12:00 PMMattie NiznikResearch Software & Programming AnalystPrinceton Institute for Computational Science and Engineering (PICSciE), Princeton UniversityMaterials: KB page, exercisesDescription: This session will introduce participants to the Python interface to MPI called "MPI for Python" or mpi4py.Learning objectives: Attendees will learn how to write parallel Python code for distributed-memory systems (i.e., multiple nodes). What is a GPU?Tuesday, October 15 at 12:45-1:30 PMRohit KakodkarResearch Software EngineerResearch Computing and Geosciences, Princeton UniversityMaterials: slidesDescription: This session will provide an overview of the structure and terminology associated with GPU hardware, and specifically NVIDIA GPUs. The sorts of parallel programming paradigms to which GPUs are best suited will also be discussed as well as the math libraries.Learning objectives: Participants will get a high-level overview of what GPUs are, how they work, and what some different approaches are to programming them (later sessions will elaborate on these approaches).Knowledge prerequisites: None. Python for GPUs: CuPy and NumbaTuesday, October 15 at 1:30-2:15 PMHenry SchreinerComputational Physicist and LecturerPICSciE and PACM, Princeton UniversityMaterials: GitHub repo, iscinumpy.devDescription: This session will introduce CuPy and other libraries as mechanisms to leverage GPUs using Python. Participants will see pragmatic hands-on examples of how the CuPy library can be used to accelerate Python code with a low barrier to entry. The Numba compiler for Python will also be demonstrated.Learning objectives: Participants will leave with exposure to different use-cases for CuPy and Numba.Knowledge prerequisites: No previous experience with GPU programming in general is required. However, programming experience with Python is expected. Introduction to Directive-Based Programming Models for GPUsTuesday, October 15 at 2:30-3:15 PMStephane EthierComputational PhysicistPrinceton Plasma Physics Laboratory (PPPL)Materials: slides, exercisesDescription: This session will give participants an introduction to OpenACC and OpenMP, which are directive-based approaches for programming GPUs.Learning objectives: Participants will leave with an overview of how to accelerate code in a portable way with minimal code changes.Knowledge prerequisites: No previous experience with OpenACC directives or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected. Introduction to KokkosTuesday, October 15 at 3:15-4:00 PMRohit KakodkarResearch Software Engineer IIResearch Computing and Geosciences, Princeton UniversityMaterials: GitHub repoDescription: This session will give participants a hands-on introduction to Kokkos, a high-level library and programming model for GPUs (and CPUs).Learning objectives: Participants will leave with an overview of how to use Kokkos to accelerate code in a portable way.Knowledge prerequisites: No previous experience with Kokkos or GPU programming in general is required. However, programming experience with C, C++, or Fortran is expected. QuestionsFor any questions, or for more information, please email [email protected].