SLURM, the "Highly Scalable Resource Manager" from Lawrence Livermore National Laboratory, has come a long way. When we started using it for some of the products in our Unified Cluster Portfolio, SLURM had only a simple FIFO scheduler, no job accounting, and the ability to support perhaps a thousand nodes.
SLURM also provided a very clean architecture which allowed HP to contribute the first versions of job accounting for Linux clusters, support for multi-threaded and hyperthreaded architectures, complex job scheduling such as gang scheduling, and fine-grained allocation of system resources ("consumable resources").
SLURM version 2.0 has just been released, and what a powerhouse it is! Heterogeneous clusters, up to 65,000 nodes, resource limits, and job prioritization. Moe Jetty and Danny Auble, the primary authors of SLURM, discuss it on this podcast.
My compliments to the entire team!