Improving Analysis Workflows with Snakemake

Oct 26, 2022, 2:00 pm4:30 pm
View location on My PrincetonU
Princeton students, graduate students, researchers, faculty, and staff


Event Description
Tired of writing sbatch scripts and complex bash logic for your work? Does your directory look like 'step_1.slurm, step_2.slurm, step_3.slurm, step_3_final.slurm'? Have you struggled to replicate previous results because some intermediate steps are lost to your shell history? Then you are ready to improve your analysis pipelines with a workflow management system! Snakemake is a concise but descriptive framework for specifying workflows that interfaces with HPC systems. Written in python, complex relationships can be described through python scripting and any command you can run on a terminal can be executed. In this workshop, you will take a series of sbatch scripts and develop them into a snakemake workflow to create a reproducible, distributable, and efficient analysis pipeline. Several cookie-cutter examples will be provided to help jumpstart your work.

Learning objectives: Attendees will learn how to convert their workflows to snakemake and get them running with a slurm scheduler.

Knowledge prerequisites: Basic Linux, HPC, and some familiarity with Conda.

Hardware/software prerequisites: (1) Bring a laptop which can connect to the eduroam wireless network. You will also need to be able to Duo authenticate to use campus resources. (2) Have an SSH client installed on your laptop. (3) Register for an account on Adroit. This is the cluster we will use for demonstration purposes. Make sure you can SSH to Adroit before the workshop. (4) Create a Conda environment and install snakemake.

Workshop format: Demonstration and hands-on

Instructor Biography: Troy is an RSE working with Joshua Akey’s lab, investigating human genetic ancestry and mechanisms of evolution. Within the Lewis-Sigler Institute of Integrative Genomics, he applies rigorous software development practices to develop new analysis pipelines and improve legacy codebases. Past research areas include 3D bioprinting, single cell mass spectrometry, and mass spectrometry imaging. Troy has a B.S in Computer Science, Chemistry, Mathematics, Biochemistry and Cellular Biology and a Ph.D. in Analytical Chemistry.