I recently attended the 1st annual workshop of the Hybrid Multicore Consortium. The 1st annual of anything shows a certain degree of optimism and the participants at this consortium are very optimistic of the role of hybrid multicore architectures in meeting the large-scale computing needs of the scientific community. We’ve been working with several ‘early innovators’ in this space for a few years, and this is one more indicator that the industry is moving into the next stage of adoption.
Several national labs - Oak Ridge National Laboratory (ORNL), Lawrence Berkeley National Laboratory (LBNL), and Los Alamos National Laboratory (LANL) - plus Georgia Institute of Technology and the Swiss Federal Institute of Technology (ETH) have established this consortium, sponsored by the Department of Energy, and membership is open to the public. HP is a founding industry member and will be participating in this consortium as part of HP's ongoing efforts to deliver high-performance computing solutions.
Hybrid here means CPU plus accelerator, where the accelerator significantly enhances the computational capabilities of the system. The consortium is developing and maintaining a roadmap of the current state of accelerator technology and gaps that need to be filled to meet the requirements of large-scale production use. A number of different accelerator technologies are relevant and were discussed, and the roadmap is not strongly biased to any technology. The issues and concerns - for example, code portability - span the various technologies.
Members of the consortium are on the leading edge of high-performance computing technology and have a vested interest in pushing this technology forward and directing it to effectively meet the needs of scientists and researchers. If you also see hybrid technology in your future, check out the consortium website: http://computing.ornl.gov/HMC/. Under Events, you can read about the experiences members have had with accelerator technology and, under Roadmap, an evolving analysis of the technology and the directions it needs to take.
Use of GPUs for HPC seems to be reaching a new stage of adoption, based on activity we’re seeing. NVIDIA announced the eagerly anticipated next generation solution, Fermi, last month. By adding in capabilities such as Error Correction, Fermi provides a level of accuracy that had not been an inherent feature of GPUs, given their prime use in games and graphics. Not a showstopper, as developers could work around that, but it has been a limiting issue for some.
Just this week, Georgia Institute of Technology announced it had received $12M NSF funding for a GPU-integrated HPC system. The participants in the project, called Keeneland, include Georgia Tech, Oak Ridge National Lab, University of Tennessee, NVIDIA and HP. An auspicious name, as the funding is part of NSF’s Track 2 awards. The announced plan is to build the system with the Fermi GPUs.
We’ll be showing some GPUs demos at SC09 booth next month, using the currently shipping NVIDIA Tesla. Each 1U Tesla S1070 has four GPUs (with 960 cores). Our HP ProLiant DL160 se has an added PCI slot relative to standard DL160. This enables us to support 3 Teslas with two 1U servers. That’s over 12 TFLOPS peak from just the Teslas, in 5U. These GPUs are not just for gaming anymore. That baby’s got game.g
Are you confused about what is an HPC accelerator and if and when to use them? If so, you are not alone. They work for small segments of the overall high performance computing (HPC) application markets but for a growing list of applications they can be lean-green machines. When they work, they can dramatically improve time to solution while cutting energy costs and saving floor space. Order of magnitude improvements in run times are common.
This is reminiscent of the glory days of vector computing when clever programming was necessary for super-fast speed. (I do miss the fun of optimizing supercomputers in the ‘80s.) The big difference is the vector CPUs of yore were hand built and cost a million dollars while today we are driven by teenagers with gaming budgets closer to $1,000.
If you are in the oil and gas, government defense, university research, financial services, or genomics-proteomics computational fields, chances are you are already using HPC accelerators or know someone that is. Hundreds (maybe even thousands) of application types have been accelerated and those driving the largest volumes are in these listed fields. The HPC accelerator industry is poised for tipping points with greater penetration into multicore environments across a rapidly expanding range of applications.
HPC accelerators are inexpensive massively parallel computing chips, programmed differently than general purpose x86 processors. Common types are graphics processing units (GPUs designed for gaming and visualization) and field programmable gate arrays (FPGAs used for flexible circuit design). They tend to deliver hundreds or cores (or functional units) for a thousand or a few thousand dollars. Cheap-fast HPC computing indeed! Applications with a good fit for this heterogeneous programming can run 10 to 100 times faster than on standard multicore servers, which is important since a run-time speedup of at least 10 times is often needed to justify programming heterogeneous architectures.
Accelerators have come a long way and are advancing rapidly. A few years ago they were nearly impossible to program by mere mortals but advances in heterogeneous programming languages and language standards are lowering that hurdle. In this rapidly changing field the computing speeds are advancing faster than Moore’s law (sometimes 5x in one year) with robust roadmaps for important features yet to come. All the major accelerator vendors are projecting significant improvements in floating point rates, error protection, and bandwidths. In addition HPC servers (including those from HP) are improving to better meet challenging cooling and bandwidth requirements that accelerators then to drive, further improving density, heat efficiency, and cost.
In this blog I plan to highlight were accelerators are working today, what changes and tipping points are on the horizon, feature improvements likely to expand the application markets, examples of improved green computing and solution times, and how they might be deployed for effective petascale research clusters. (Petascale accelerated clusters exist today but not yet with significantly improved Linpack price/performance compared to multicore.)
I hope to help you judge if and when to get into this exciting and expanding field of heterogeneous computing so you can time your entry for leading edge, but not bleeding edge, improvements. I welcome comments and questions on this topic. For more information see www.hp.com/go/accelerators.