The past decade has seen a rapid evolution of computing architectures in order to increase performance despite inherent speed limitations that arise from power constraints. One growing trend involves wider vector units, which allow more data elements to be processed simultaneously in a single instruction. To leverage this hardware-level vectorization, programmers need to know how to identify potentially vectorizable loops and how to optimize them for a given processor architecture.
This workshop provides a practical guide on how to make your code run faster on modern processor architecture through vectorization. After a brief introduction to the hardware, we will use Intel Advisor -- a powerful profiling tool -- to identify and then exploit vectorization opportunities in code. Hands-on examples will allow attendees to gain some familiarity using Advisor in a simple yet realistic setting.
Workshop format: Lecture, demonstration, and hands-on exercises
Target audience: This workshop is geared toward computational researchers looking to leverage performance features of Intel hardware to improve the performance of C/C++ codes.
Knowledge prerequisites: Basic Linux, experience with C/C++, and basic familiarity with the Princeton research computing clusters.
Hardware/software prerequisites: Overarching requirements for all PICSciE virtual workshops are listed at https://researchcomputing.princeton.edu/education/training/virtual-work…. Participants should ensure they have met these requirements in advance, as there will be no technical troubleshooting during the workshop itself. In particular, participants should have an account on Adroit (the cluster we will use for demonstration purposes) and be able to SSH into that machine.
Learning objectives: Attendees will leave with a better understanding of the performance-boosting features of different computer architectures and learn techniques for tweaking their codes to take maximum advantage of them.