UPDATE: Applications for the 2023 RSE Fellows Program closed on April 16, 2023. Applicants will be notified of their application status in the coming weeks.
Are you enthusiastic about software & programming and interested in applying your skills to exciting academic research problems? Then the Princeton University Research Software Engineering Summer Fellows program could be for you. As a summer fellow, you will work under the mentorship of a professional Research Software Engineer (RSE) to build, develop and optimize software used in cutting edge Princeton research software projects.
The RSE Fellows Program lasts 10-12 weeks during the summer, depending on academic year schedules. The fellow is expected to be available full-time, not employed in any other activity (including graduate assistantships) or following significant coursework. Relocation to Princeton is not required. In most cases, fellows will remain in their home location and work remotely with their mentor. Summer fellows will receive a training stipend of $600/week, paid monthly, during the course of the program, assuming satisfactory progress. Funds are also available for a 1-week visit (optional, but encouraged) by the fellows to Princeton to work with their mentor and meet other RSEs and students.
Prior knowledge of the research domain is helpful, but not required. Dedicated training activities will be offered to help fellows improve specific software skills. Applications from women and members of underrepresented groups in STEM activities are particularly encouraged.
Eligibility: You must be enrolled as a student at an accredited University or College and have completed at least 1 academic year by the start of the summer fellowship. US citizenship is not required, but if you are in the US on a student visa you must be eligible for participating in an optional practical training (OPT) activity.
Application: Interested students should apply via this Google Form. You will need to provide:
- Your full name, email address, the name of your university or college and your current or planned major and/or area of study.
- A resume/CV (in pdf format) with contact information.
- An academic transcript - this can be unofficial, but should include course titles and overall GPA.
- A short essay describing your interest in the RSE fellows program (maximum 1 page, pdf format). For example, you may wish to expand on 3 or 4 topics from the following list: your background, your skills, and strengths; what software, computing or scientific topics appeal to you; previous research experience, if any; what you may want to pursue as a future career; and what benefits you would like to gain from this program. If you already have a potential mentor/project which interests you from the list below, you can also mention that here. This is however not required to submit an application. Selected applicants will be connected to potential mentors in a 2nd step following the application (see below).
- [Optional] The full name and email address of a reference. Ideally it would be someone with whom you have interacted in a STEM context (e.g. a course or a previous research activity). You should contact the person in advance to confirm that they will write a letter for you and simply provide their name/email in the application form. After you submit the form, we will contact them to request the letter.
Final Deadline and Selection Process: Sunday, 16 April, 2023.
Applications will be evaluated as they arrive. Selected applicants will be matched with a potential mentor (based on possible matches to their skills and interests) for a short interview and discussion of a possible project that matches the applicant’s skill level. Based on the interview and this discussion the applicant will then write up and submit a short 2-page proposal with a plan of work and timeline for the summer. Acceptance into the RSE summer fellows program will be based on this short proposal.
Mentors and Projects for Summer 2023
Garrett Wright (ASPIRE)
ASPIRE is an open-source Python framework for Computational CryoEM image processing and algorithm development targeting abinitio single particle reconstruction pipelines. There are several projects that could be matched to background and interests for candidates with varying amounts of Python and mathematical experience. We could use some help modernizing and parameterizing our Continuous Integration test suite, adding and experimenting with preprocessing a recently published reconstruction algorithm, implementing an image bandpass metric, and performance optimization leveraging caching in distributed computations. Completing any of these projects should foster demonstrable experience in a modern Python scientific computing ecosystem and result in public facing GitHub contributions.
Colin Swaney (Project #1)
Help build open-source software to process high-frequency trade data for finance, economics, and engineering researchers. Fellows will assist in implementing logic for the latest version of the data, add enhancements to current processing capabilities (e.g., process currently unsupported fields), and explore alternative large-scale data storage solutions. Fellows will have the opportunity to gain experience with distributed computation and package development in Julia.
Colin Swaney (Project #2)
Help build computer vision models for the New Jersey Families Study. Fellows will modify existing research code to train state-of-the-art re-identification models, set up pipelines to train networks on secure research infrastructure, and assist in testing data annotation software. Fellows will have the opportunity to gain experience with technologies such as PyTorch and Docker.
Vineet Bansal (Guidescan)
Guidescan 2 (https://guidescan.com/) is a state-of-the-art CRISPR tool written in C++, which helps researchers involved in genomic editing to find targets for gene perturbation. A novel and efficient algorithm developed as part of the project greatly speeds up this search process. A web-based frontend makes it easy for practitioners in the field to identify potential targets and quantify their efficacy, without any prior technical setup.
We're in the process of polishing up the Guidescan 2 code to make it more easily deployable and usable by researchers across the world. We plan to run analyses on new datasets to uncover novel targets for gene editing across several organisms and their genomic variants. We also hope to reanalyze and interpret previously published CRISPR screening data. Finally, we plan on building efficient and fully-automated and reproducible data analysis pipelines along the way to make it seamless to get from raw genomic data to actionable research output.
The technology stack currently used in Guidescan is C++, Python, Clojure, ReactJS, and PostgreSQL. The Guidescan approach and the associated code were developed under the supervision of Dr. Yuri Pritykin, Assistant Professor at Computer Science and the Lewis-Sigler Institute for Integrative Genomics at Princeton University.
CRISPR is a groundbreaking research approach, yet fundamentally understandable to students with little prior training in the field, and one that offers exciting potential for candidates interested in real world computational biology. You will have an opportunity to interact with Dr. Yuri’s research team (“PritykinLab”) and get a hands-on feel of this Nobel-prize winning technique in genome editing.
Posfai Lab is interested in understanding cell fate decisions in mouse embryos and the lineage of cells in the embryo as they divide up to the 64 cell stage. Figure 1 illustrates the general idea of imaging the dividing cells of an embryo and constructing a lineage tree.
The lab collects large scale time-lapse imaging data of the live embryo made by long-term, multicolor, live light sheet imaging to visualize and quantify the dividing cells. Figure 2 shows a snapshot of such an image which has been segmented and the cells identified as inner and outer cells.
The lab in collaboration with Stas Shvartsman’s group at the Flatiron Research Institute, have have developed image analysis tools, including convolutional neural network-based image segmentation (3D Stardist and Cellpose) trained with hand-annotated ground truth data, an image registration method that uses Coherent Point Drift to correct for rotational movement of embryos throughout the time series, a semi-automated tracking method which uses a lightweight, low latency visualization tool to manually review and correct for errors in the tracking pipeline, as well as code for extracting and visualizing various quantitative measurements and features from segmented and tracked imaging data.
The manual review and error correction largely involve correcting the segmentation errors of the CNN and fixing incorrect tracking of cells through time. In a large number of cases, the tracking is incorrect because of segmentation errors that could not be fixed before the images were rotated and registered. The internship will involve understanding the manual correction steps and making improvements that allow users to correct the segmentation in registered images and easily re-run the tracking.
Henry Schreiner (Project #1)
Scikit-build-core is a modern, standards-based rewrite of the scikit-build, a backend that connects a compiled code backend (CMake) with Python packaging. This is a three year project building the tool, tutorials, documentation, and assisting at least a dozen packages in the transition from older arcane build systems to scikit-build-core.
An ideal project would be developing a user-facing CLI tool that would simplify common developer tasks, like starting a new project, making multiple builds, rebuilding, or publishing packages. This could be initially developed as a stand-alone tool, but it could eventually be interested into the original scikit-build package. There are other possible projects, as well, including developing new dynamic metadata plugins, adding CMake helpers for common situations, moving an existing package over to scikit-build, and building tutorials, depending on a fellow’s interest and initial knowledge.
Henry Schreiner (Project #2)
Scikit-HEP is a collection of packages initially designed around meeting High Energy Physics needs that were not currently met in the Python ecosystem, and now contains several powerful and general tools used by multiple communities. One such need solved by these tools is histograms with “Hist”.
There are several possible projects with Hist, based on an applicant’s desire and skill level: Adding a statistical tool module, plotting and visualization improvements, developing a common serialization specification, and adding support for numba just-in-time compilation for some common histogram fills.
We use graph neural networks (GNNs) to reconstruct trajectories (“tracks”) of elementary particles traveling through a detector. The task can be described as a combinatorically very challenging “connect-the-dots” problem, essentially turning a cloud of points (hits) in 3D space into a set of O(1000) trajectories. Expressed differently, each hit (containing not much more than the x/y/z coordinate) must be assigned to the particle/track it belongs to.
The project code together with documentation and a reading list is available on github.com/gnn-tracking/ and uses pytorch geometric. More details are available in our GSoC proposal (https://hepsoftwarefoundation.org/gsoc/2023/proposal_GNN_tracking_object_condensation.html) for the same project.