GPU Hackathon Participants Tackle Code Optimization for Increased Performance
In today’s competitive research environment, the compute capabilities of scientific applications are critical to the success of many academic research programs. To enable new advances, institutions are turning to specialized teams that help researchers create the most efficient, scalable, and sustainable research codes possible by applying cross-disciplinary computational techniques to new and emerging areas of science.
At Princeton University, the Research Software Engineering Group of the Research Computing department works closely with the academic departments to accelerate faculty-driven computational research. Together with the Princeton Institute for Computational Science and Engineering (PICSciE), they hosted Princeton’s first GPU Hackathon during the week of June 24th, 2019. Eight teams of developers, representing a myriad of scientific domains, attended the five-day event to optimize and port their codes to GPUs.
“By combining direct mentoring, agile programming practices, and a concentrated team effort, events like the hackathon have the ability to enable researchers to make considerable gains porting their code to GPUs,” said Ian Cosden, manager of the Research Software Engineering Group at Princeton. “Porting code to the GPU can be daunting to those who don’t have the experience, and this event is specifically designed to quickly bring research software developers up to speed by working with experts on their code.”
From magnetic confinement fusion to climate and weather, hackathon participants across disciplines worked diligently with mentors to identify bottlenecks and collaborate on solutions to accelerate code performance.
Fusion energy holds the promise of a clean and sustainable energy source. Produced by “fusing together” light atoms, such as hydrogen, at an extremely high pressure and temperature until the gas becomes a plasma, fusion power offers the prospect of an almost inexhaustible source of energy for future generations; but it also presents engineering challenges. The tokamak is an experimental machine designed to harness the energy of fusion where the energy produced through the fusion of atoms is absorbed as heat in the walls of the vessel and converted to produce steam and then electricity by way of turbines and generators.
Team GTS from the Princeton Plasma Physics Laboratory (PPPL) focused on the Global Tokamak Simulation (GTS), a code developed to study the influence of microturbulence on particle and energy confinement during the process of fusion. Since microturbulence is believed to be responsible for the unacceptably large leakage of energy and particles out of the hot plasma core, understanding and controlling this process is crucial as it ultimately determines the efficiency and viability of tokamaks.
Written in FORTRAN 90 language with some rewrites in C, GTS is a Particle-in-Cell (PIC) code with more than 60 percent of the computational work spent in particle-related work. Using OpenACC directives, Team GTS learned to handle global variables declared in FORTRAN modules to address in the electron and ion “push” particle loops and achieved a 25X speedup for a single kernel, with a 3X overall speedup for the application.
Figure 1: Team GTS Kernel Speedup
Weather and Climate
With members from the National Oceanic and Atmospheric Administration (NOAA) and the Geophysical Fluid Dynamics Laboratory (GFDL), Team GFDL spent the week focusing on Finite-Volume Cubed-Sphere Dynamical Core (FV3). This GFDL-developed dynamical core is the engine of a numerical weather prediction model that powers NOAA's Unified Forecasting System (UFS), including the new National weather forecast model at the National Weather Service.
Hoping to extend initial work to GPU-accelerate the code, the team targeted three specific kernels within FV3 that needed modifications to adopt to GPU architecture. They choose OpenACC to maintain readability of the code for continuing scientists. The results at the end of the week showed promising progress: optimization work on one kernel showed a speed improvement of 25 percent over previous GPU implementation (2.7 seconds versus previous time of 3.6 seconds on a single GPU); another kernel showed a performance improvement of 1.2 seconds versus 79 seconds on IBM Power9 single core CPU, or 5 seconds on GPU including data movements; and finally, the team made significant progress in a code rewrite for the third kernel that they had identified.
Team SPECFEM-X, comprised of members from Princeton’s Theoretical & Computational Seismology Research Group, used the week to focus on implementing a geodynamics code SPECFEM3D on a multi-GPU platform. As Center for Accelerated Application Readiness (CAAR) program codes, SPECFEM3D and SPECFEM 3D Globeprovide a versatile geoscientific software package that simulates acoustic, elastic, coupled acoustic/elastic, poroelastic or seismic wave propagation, enabling researchers to simulate and analyze geodynamics problems such as coseismic and post-earthquake deformation, earthquake-induced gravity perturbation, gravity anomalies among others.
By converting the FORTRAN code to C and adding CUDA, Team SPECFEM-X was able to see a 250 times performance speedup on a single GPU compared to a single CPU core.
“I'm a firm believer in collaborative science, and it was wonderful to see several of my graduate students and postdocs deeply engaged,” said Jeroen Tromp, Blair Professor of Geology and Professor of Applied & Computational Mathematics at Princeton University. “The team made tremendous progress during the week, and that momentum has carried the project forward ever since. In four days they managed to obtain a two-orders-of-magnitude increase in performance, and since then they've made further optimizations that gained them another order of magnitude. This speedup has opened up an entirely new class of problems in quasistatic global geophysics. From my perspective, this was a very successful event!”
Added Cosden, “Given the current architectural trends in HPC, this kind of event provides a huge opportunity to accelerate and advance science in ways that research groups alone sometimes can’t achieve.”
Additional GPU Hackathons are scheduled throughout the year. For further information and to apply, visit www.openacc.org/events.