Learning resources: SLURM

How to Use these Resources

All the Research Computing clusters at Princeton rely on a workload manager called SLURM to allocate resources to jobs of different users. SLURM is the principal vehicle users should use when submitting computing jobs to the clusters. While it is best to follow Research Computing's own references (including workshops) for using SLURM specifically on Princeton systems, the materials below offer more general tutorials and documentation for interested users.

 

Videos

Introduction to SLURM -- an 8-part video tutorial (< 10 mins each) from SchedMD, the makers of SLURM.

OLCF Seminar on Migrating to SLURM -- this ~30 min. video (along with slides) is from a 2019 seminar given at the Oak Ridge Leadership Computing Facility (OLCF) when they migrated systems to SLURM from another scheduler.  Users who are familiar with other workload managers and schedulers (e.g. Moab, PBS/Torque) may find this helpful.

SLURM for Developers -- more advanced. Intended for SLURM developers and/or system administrators.

 

Self-paced online courses

Advanced SLURM -- self-paced online course from the Cornell University Center for Advanced Computing (CAC) for users who have already used SLURM but whose needs go beyond simple batch files or small interactive jobs.

 

Web pages / written online tutorials

SLURM Quick Start -- an online written quick start user guide (from SchedMD).  Pithy but informative, with useful diagrams.

Workload Manager "Rosetta Stone" -- another useful resource (from SchedMD) for users new to SLURM but familiar with other workload managers.

Additional SLURM Documentation -- more complete written documentation on features and applications (from SchedMD)

 

Books

Parallel Programming In MPI and Open MP -- book by Victor Eijkhout from the Texas Advanced Computing Center (TACC).  Available in online (the online version is titled "Parallel Programming for Science & Engineering") and print versions.  Its appendix on "Batch Systems" describes how clusters are set up and how SLURM works on clusters, and it includes examples of and strategies for scheduling different types of jobs.