Cheers, cheers for old Notre Dame! News hit the wires this week of new 490 node cluster at the Center for Research Computing (CRC) at Notre Dame. The nodes are HP ProLiant DL165 G6 servers, with 6 core AMD processors and connected with GigaBit Ethernet. HP SCI Elite Partner, Matrix Integration, delivered the HP Cluster Platform 4000.
This continues a wave of new deployments of HPC clusters across universities and colleges. Last year, for instance, HP delivered a 1000+ node HP Cluster Platform 3000BL to the Minnesota Supercomuting Institute. Read here for news on that system.
The new Top500 list was released recently, and HP (and BladeSystems) again tops the list as the preferred technology choice by this extremely knowledgeable and demanding community of users! What's the Top500? It's a listing of the world's most powerful supercomputing systems, based on the Linpack benchmark, and is compiled twice a year. HP systems power 212 of these 500 sites (42%). Of those, 207 sites deploy HP c-Class BladeSystems, making it the most popular and fastest growing architecture on the list. Notable trends include: the continued growth in InfiniBand as high performance interconnect, especially for the top tier in the Top500; demonstration of Windows HPC scalability, with that technology being used at the #15 site (in Shanghai); and new countries gaining ground in the top tier, such as Saudi Arabia with the #14 spot.
Today HP announced the new ExSO portfolio. The products are designed for Extreme Scale Out (duh) but are quite applicable for HPC and clusters. After all, the typical HPC user wants highest performance per dollar, and per watt. Yes, these systems are ideal for big Web 2.0 and cloud data centers, but the economics make a lot of sense for mere mortals with 100+ nodes. As the announcement explains, the new ProLiant SL family uses a ”skinless” systems architecture that replaces the traditional chassis and rack form factors with an extremely lightweight rail and tray design.” Multiple nodes share power units and fans, which deliver better power utilization and cooling than possible in traditional 1U servers, as we found with the HP BladeSystem. But also the skinless design means less metal to retain heat, and also less weight. Each SL6000 chassis holds 2 trays. There are currently 3 basic servers available (you can mix and match in a standard rack) – there are the 2 nodes in 1U tray for compute intense apps, a large memory node in 1 U tray for memory intense apps, and a node with up to 6 disks. The energy efficiency is further enabled with the Intelligent ExSO Center including the new Data Center Environmental Edge that provides a visual map of the data center environmental variables.
SLURM, the "Highly Scalable Resource Manager" from Lawrence Livermore National Laboratory, has come a long way. When we started using it for some of the products in our Unified Cluster Portfolio, SLURM had only a simple FIFO scheduler, no job accounting, and the ability to support perhaps a thousand nodes.
SLURM also provided a very clean architecture which allowed HP to contribute the first versions of job accounting for Linux clusters, support for multi-threaded and hyperthreaded architectures, complex job scheduling such as gang scheduling, and fine-grained allocation of system resources ("consumable resources").
SLURM version 2.0 has just been released, and what a powerhouse it is! Heterogeneous clusters, up to 65,000 nodes, resource limits, and job prioritization. Moe Jetty and Danny Auble, the primary authors of SLURM, discuss it on this podcast.
My compliments to the entire team!
So it's not brain surgery, but it is rocket science!
See article in Supercomputing Online.