Written by
Sharon Adarlo
April 15, 2021

Since he arrived at Princeton University in late 2017, Vineet Bansal has been very busy. Bansal is a senior research software engineer jointly appointed in the Center for Statistics and Machine Learning (CSML) and the Princeton Institute for Computational Science and Engineering (PICSciE).

Since arriving on campus, he has worked on a wide range of complex computational projects. His projects include developing web applications for analyzing groundwater and protein sequences to creating a sophisticated software package to analyze data from electron microscope images of molecules held in ice.

"Every day, I am doing something new. I am always learning a new field of research when I work with different faculty and students. I enjoy working with my collaborators; they are passionate people," Bansal said.

Vineet Bansal

Bansal's job is to collaborate with CSML-affiliated faculty members to develop computational tools that enhance their research. Bansal has worked with faculty in various departments such as astrophysics, computer science, sociology, mathematics, and molecular biology. With his software engineering skills and his aptitude for interdisciplinary work, Bansal has become an indispensable resource on campus.

"Vineet has helped many research teams fast track their research," said Peter Ramadge, CSML director. "His involvement and contributions are significantly enhancing scientific discovery at Princeton."

The process to start working with Bansal begins with CSML putting out a call for proposals. After collecting software engineering proposals, Bansal and Ian Cosden, director of Research Software Engineering for Computational & Data Science at PICSciE, meet with different faculty members and go over their projects. Along with Ramadge and CSML staff's input, Bansal and Cosden scope the projects to see what is feasible and where Vineet can have the most impact.

The next window for proposals to be submitted is coming up in May, said Bansal. The team will make decisions on which projects to support around the end of June.

Currently, Bansal is working on two major initiatives: a large-scale analysis of groundwater in the United States and a project at the intersection of data science and genomics.

In the first project, Bansal is involved in building a dedicated computer cluster that houses, processes, and models groundwater data throughout the United States. A significant part of the project is a massive three-dimensional map of the continental US, which has measurements at one-kilometer increments in latitude and longitude, and one-meter increments depth-wise.

"Vineet has been amazing. He's been a critical piece of the whole project," said Reed Maxwell, professor of civil and environmental engineering and the High Meadows Environmental Institute and one of the research project's co-principal investigators.

The computing cluster Bansal has been working on is a vital part of a $1 million federally-funded, multidisciplinary project called "Hydroframe ML." The project seeks to use machine learning to model and understand the country's groundwater, an increasingly essential need in light of climate change. The National Science Foundation chose this project, and 28 nationwide, to receive a total of $27 million as part of the institution's Convergence Accelerator program. Besides Maxwell, the second Princeton co-principal investigator is Peter Melchior, a faculty member co-appointed in astrophysical sciences and CSML.

Bansal, along with Calla Chennault, a research software engineer in civil and environmental engineering, has been developing the bones for a web-based application that uses the 3D map. Users will be able to use this map and run machine learning models to analyze hydrological data in the region of interest, said Bansal.

In the second project, Bansal worked with Ben Raphael, professor of computer science, on computational biology and bioinformatics research. Raphael's lab concerns itself with next-generation DNA sequencing, genome rearrangements in cancer and evolution, and network analysis of somatic cancer mutations.

For Raphael, Bansal is currently restructuring a performance-critical component of the HATCHet pipeline. HATCHet stands for Holistic Allele-specific Tumor Copy-number Heterogeneity and is an algorithm to analyze tumor samples. Bansal said that improving this component will make the software packaging more easily adoptable by other researchers and ease deployment.

"In the past few months, I have also worked on implementing a 'cloud mode' in HATCHet, where the entire computational pipeline can be deployed and run on a commercial cloud. This approach makes it easier for researchers to run an end-end analysis of their tumor datasets without having to worry about software installation or data transfer issues," said Bansal.

Besides developing software applications for faculty, Bansal is active in campus education. In January, he taught a workshop on the basics of NumPy as part of the two-week Princeton University Research Computing Bootcamp. This package underlies most scientific computing done in Python. CSML lecturer Daisy Huang and DataX data scientists Brian Arnold and Andrzej Zuranski also participated in the bootcamp as course instructors.

"I enjoy working at CSML and with the different faculty, departments, and centers," said Bansal. "I have been able to do many things in a relatively short period. It's never boring for me. It's truly gratifying work."