The Lab

Experiments, insights, and hands-on challenges for HPC engineers.

Deep technical explorations of SLURM scheduling, real cluster data analysis, and practical challenges that sharpen your HPC operations skills.

Featured

Deep Dive12 min read

How SLURM calculates job priorities using historical usage patterns and why it matters for optimizing cluster throughput.

Fair ShareSchedulingPriority

Experiment8 min read

Real-world data from a 128-GPU cluster showing how job packing strategies affect overall utilization and user wait times.

GPUMulti-TenantOptimization

Deep Dive2025-01-28

How SLURM calculates job priorities using historical usage patterns and why it matters for optimizing cluster throughput.

12 min read

Experiment2025-01-24

Real-world data from a 128-GPU cluster showing how job packing strategies affect overall utilization and user wait times.

8 min read

Challenge2025-01-20

Walk through identifying and resolving a complex queue blockage caused by misconfigured job dependencies.

6 min read

Tutorial2025-01-15

Step-by-step guide to implementing fine-grained resource tracking for better cost allocation and cluster analytics.

15 min read

Deep Dive2025-01-10

How to leverage Quality of Service settings to create priority tiers and enforce resource limits across different user groups.

10 min read

Experiment2025-01-05

Performance comparison of major HPC schedulers under various workload patterns, with real cluster data and insights.

14 min read

Join early access to get notified when new experiments, challenges, and deep-dive articles are published.