Tutorials

Master SLURM through structured, expert-guided tutorials.

From beginner basics to advanced cluster optimization, these step-by-step guides provide the knowledge you need to become proficient in HPC job scheduling and resource management.

8
Tutorials
4
Categories
~230
Minutes
3
Levels
Beginner15 minGetting Started

Introduction to SLURM Workload Manager

Learn the fundamentals of SLURM architecture, key concepts, and basic commands to get started with HPC job scheduling.

Topics covered:
ArchitectureBasic CommandsJob Submission
Beginner20 minGetting Started

Your First SLURM Job Script

Write and submit your first job script with proper resource requests, environment setup, and output handling.

Topics covered:
Job ScriptssbatchResource Requests
Intermediate25 minJob Management

Understanding Job States and Queue Management

Master the lifecycle of jobs in SLURM, from pending to completion, and learn effective queue management strategies.

Topics covered:
Job StatesQueue Prioritysqueuescancel
Intermediate30 minJob Management

Advanced Job Submission Techniques

Explore job arrays, dependencies, and conditional execution for complex workflow automation.

Topics covered:
Job ArraysDependenciesWorkflow Automation
Intermediate20 minResource Management

Optimizing Resource Allocation

Learn how to request CPUs, GPUs, and memory efficiently to maximize cluster utilization and minimize wait times.

Topics covered:
CPU AllocationGPU SchedulingMemory Management
Advanced35 minResource Management

Working with GPU Resources

Deep dive into GPU allocation, CUDA environment setup, and multi-GPU job configurations.

Topics covered:
GPU AllocationCUDAMulti-GPU Jobs
Advanced40 minAdvanced Topics

SLURM Accounting and Fair Share

Understanding how SLURM tracks resource usage and calculates job priorities using the fair share algorithm.

Topics covered:
AccountingFair SharePriority Calculation
Advanced45 minAdvanced Topics

High Availability and Job Preemption

Configure preemptable jobs, implement checkpointing, and design resilient workflows for shared clusters.

Topics covered:
PreemptionCheckpointingHigh Availability

Recommended Learning Path

1. Foundation

Start with "Introduction to SLURM" and "Your First Job Script" to build a solid foundation.

2. Practice

Progress through job and resource management tutorials to handle real-world scenarios.

3. Master

Tackle advanced topics like accounting, fair share, and high availability for expert-level knowledge.

Get notified about new tutorials

Join early access to receive updates when we publish new SLURM tutorials, guides, and expert insights.