Second Bootcamp Prepares Graduate Students, Postdocs for Computational-Driven Research

Written by
Sharon Adarlo
Nov. 26, 2019

Over a span of five days earlier this fall, graduate students and postdocs at Princeton University hunkered down on a series of workshops and exercises exploring foundational topics, tools and techniques in modern computational research, from Python to the intricacies of computer architecture.

This week-long event, the Research Computing Bootcamp, drew 130  registrants mostly graduate students and postdocs from 39 departments to the Lewis Science Library, where the bootcamp was held from October 28th to November 1st. The Princeton Institute for Computational Science & Engineering (PICSciE) and OIT Research Computing oversaw the training sessions, along with co-sponsorship from the Graduate School,  the Center for Statistics and Machine Learning, the Princeton Neuroscience Institute, the Program in Applied & Computational Mathematics, and the School of Engineering & Applied Science.

This was the second year this bootcamp has been held at the University. Last year's event was spearheaded by Ian Cosden, manager for HPC Software Engineering and Performance Tuning. That inaugural event attracted about 80 graduate students, and postdocs and staff from 21 departments.

Gabe Perez-Giz, a research software and computing training specialist at PICSciE and lead organizer of this year's bootcamp, said the growth in participation reflects the reality that graduate students and postdocs who do research – whether in science, engineering or social science – increasingly encounter intensive computing work as part of their research.

 

PICSciE Training Specialist Gabe Perez-Giz helped participants troubleshoot during a hands-on session

“Unfortunately, even though this knowledge is expected, researchers rarely get formal training in the computing-related skills they need,” said Perez-Giz. “There are almost no formal courses in the registrar targeted at individuals without a formal computer science background. Events like the bootcamp fill this gap so that researchers can be more productive without feeling inhibited by a lack of know-how about tools and practices.”

“Our expectation is that this bootcamp will become a staple of our education and training efforts,” said Jeroen Tromp, Director of PICSciE and Blair Professor of Geology, professor of geosciences and applied and computational mathematics. 

The first three days of the bootcamp, aimed at researchers with less computing experience, covered introductory topics such as the Linux command line, good practices for research software engineering, how to navigate Princeton’s high-performance-computing (HPC) clusters, how to make effective plots, Python, Git and GitHub, and other topics. The last two days were geared for researchers with more programming experience looking for a hands-on introduction to high-performance computing.  More specifically, this mini-workshop within the bootcamp covered parallel programming, a type of coding that allows many calculations or processes to be carried out simultaneously on supercomputers that consist of clusters of CPUs and GPUs.

Last year, foundational topics were mixed in with more advanced subjects. To better serve distinct audiences, the organizers decided to restructure the workshop so that more advanced, intense topics, such as parallel programming, happened in the later part of the week. Other subjects that were covered were topics that people had suggested during the semester.

The instructors, Perez-Giz said, volunteered their time to do the bootcamp. Bei Wang, one of the bootcamp instructors and an HPC software engineer with OIT Research Computing , taught a workshop on vectorization, a component of HPC, in the parallel programming portion.

“I think the students did pretty well. My feeling is that the students have some knowledge of HPC but not in that area yet (vectorization). I think this tutorial gave them a good opportunity to improve their skills and learn the latest technology,” said Wang. 

Carolina Roe-Raymond, a visualization analyst at PICSciE and one of the instructors, taught a session on making effective plots, transforming data into graphics and visuals that communicate research effectively. 

“The students were great. I was impressed with their stamina because we provided a lot of technical information in a short period of time. They remained engaged and I got some good questions from them," she said. 

PICSciE Visualization Analyst Carolina Roe-Raymond details the principles behind effective scientific plots.

Hannah Waight, doctoral student in sociology, attended the Thursday session on parallel computing. Her research focuses on media manipulation in China, which involves a lot of quantitative text analysis. 

“I found it really helpful. Even though I don’t use the programs they were teaching in, I found it to be a helpful theoretical introduction to the mechanics of it,” said Waight, who wants to use the campus cluster computers more effectively.

HPC has become more important in the last couple of years to sociologists, said Waight. Sociologists have become interested in using Big Data, mostly data generated from apps or from other untraditional sources, said Waight. Traditional quantitative research in sociology usually involves survey data. Waight herself uses newspapers for her text analysis research. 

Amina Kurbidaeva, postdoc in molecular biology, attended the bootcamp and found the tutorials helpful. 

“I have recently started learning programming and I wanted to get more knowledge on it. The workshop was really productive and I learned a lot,” she said.

Kurbidaeva, who studies gene regulation in fruit flies, said high-performance computing has become important in molecular biology. 

“I am a biologist. I mostly use genetics and biochemistry in my work. But you need programming to analyze data. Even if you do benchwork, things are a lot easier if you know how to code and program,” said Kurbidaeva, who is studying Python.

Caoxiang Zhu, a postdoc at the Princeton Plasma Physics Laboratory, focuses his research on optimizing the magnetic confinement field for plasma in fusion energy reactors. 

“Since the main focus of my research involves using numerical tools to compute and calculate, I am really interested in numerical technologies in these courses that can help in my research,” said Zhu on why he participated in the bootcamp. “My favorite part of the week has been practicing simultaneously on your laptop while the lecturer introduces a concept. You are getting hands on knowledge.”

Greg Chan, a graduate student in computer science, decided to attend the workshops because they were introducing some topics with which he wasn’t too familiar. His research focus is on compilers and computer architecture, with a focus on parallelism – so the last two days of the workshop have been meaningful to him, he said.

“It gave me some food for thought,” he said about the workshop. “Like how maybe I can change my research direction to target some of these applications.”