Princeton University held its 2nd annual research data management workshop for graduate students from Jan. 27 to 29. Some 40 graduate students attended lectures and breakout sessions on topics such as creating data management plans, preserving and sharing data, analysis tools, and open research practices, as well as legal and ethical considerations of data management.
The program was organized by the Princeton Research Data Service (PRDS), the Princeton Institute for Computational Science and Engineering (PICSciE), and OIT Research Computing. Co-sponsors included the Center for Digital Humanities, the Center for Statistics and Machine Learning, the Data-Driven Social Science Initiative, the Graduate School, Office of the Dean for Research, and Princeton University Library (PUL).
Wind Cowles, PRDS director, worked closely with Ma. Florevel (Floe) Fusin-Wischusen, institute manager at PICSciE, and 34 campus experts to develop the workshop’s curriculum reflecting the research data life cycle: planning, acquisition, and sharing results. After reviewing feedback from the first workshop in 2019, organizers condensed the program from five to three days and introduced discipline-specific breakout sessions.
In her introduction, Cowles told graduate students that beyond publishing research findings, “data are also a result of your work and are just as, if not more important.” Researchers can increase their impact by making data available to colleagues, she said. “The goal of research, ultimately, is to share what we learn with others.”
According to Curt Hillegas, associate chief information officer for Research Computing, OIT and PICSciE, data are growing at an exponential rate and researchers need to understand how to manage it.
Professor Sebastian Seung discusses working with big data for visual images of rat brains. Photo by Floe Fusin-Wischusen, PICSciE
Sebastian Seung, professor of computer science and neuroscience, discussed his research to share the possibilities and challenges of working with big data. Seung uses techniques from machine learning and social computing to extract brain structure from light and electron microscopic images. His visualizations of neurons and synapses involve many terabytes of data. Seung explained that several people in his lab simultaneously compute different properties of the data. In this work, he asks: “How do we make our analysis more collaborative? How do we share our data with others?” People at the forefront of data science are grappling with these issues, he commented.
Amongst the research community, there’s an increasing commitment to data stewardship and open accessibility, said Cowles. Researchers are expected to make data available so others can reproduce results, build upon initial findings, and trust the legitimacy of the work. Reflecting this movement towards open access, “many journals now require that you make data publicly available as part of the publication process,” Cowles said.
Additionally, Scholarly Communications Librarian Yuan Li said: “Many funders require data management plans in grant applications.” Funders want to know how researchers will manage the data and how they will make it accessible to others. “Huge datasets will easily get messy if you don’t have a good plan,” said Li. “It will slow down the research process.” Li presented an online tool for writing data management plans (DMPs), available to Princeton researchers for free through PUL.
DMPs were new to Xin Sun, a fifth-year graduate student who studies environmental microbiology in the geosciences department. “The workshop helped me know what I didn’t know before,” she said. “It introduced us to so many resources across campus.”
Graduate students reflect on their workshop experience over lunch. Photo by Denise Applewhite, Office of Communications
Christine Murphy, assistant dean for academic affairs in the Graduate School, encouraged graduate students to seek out the people on campus who are trained to help and to take advantage of Princeton’s resources. Behavioral Sciences LibrarianMeghan Testerman echoed this sentiment and encouraged students to: “Think of your subject librarians as a resource.”
Dahyun Choi, a second-year graduate student in politics who attended the workshop, appreciated learning about the Library’s databases and plans to use its datasets to explore her research questions on international relations. “I didn’t really know how to manage datasets before,” said Choi. “They gave specific guidelines and ways to share with others.”
Chloe Cavanaugh studies electrobiology as a first-year M.D.-Ph.D. student in a joint program between Princeton and Rutgers Robert Wood Johnson Medical School. She is applying for an NIH grant this April. “The session on data sharing policies for NIH was very pertinent,” she said. “The session told us, here’s what NIH practices are. Here’s what you need to think about. This whole workshop was about thinking ahead. Help yourself out now, to make it easier on yourself later.”
According to Ole Agersnap, a third-year graduate student in economics, “the most helpful sessions were the [breakout] sessions specific to social science, tailored to the data we use.” Agersnap commented, “I agreed with the statement at the beginning of the workshop – all students should have these skills.”
To learn more about PRDS, a joint initiative between the Office of the Dean for Research, the Office of the University Librarian, and the Office of Information Technology, visit the PRDS website.
Written by Emily Judd, Library Communications Coordinator
Media contact: Barbara Valenza, Director, Library Communications