When the sun set over Waikiki, HP BladeSystem stood as the victor of InfoWorld 2010 Hawaii Blade Shoot-Out. Editor Paul Venezia blogged about HP's gear sliding off a truck, but other behind-the-scenes pitfalls meant the world's #1 blade architecture nearly missed the Shoot-Out entirely.
Misunderstandings about the test led us to initially decline the event, but by mid-January we'd signed on. Paul's rules were broad: Bring a config geared toward "virtualization readiness" that included at least 4 blades and either fibre or iSCSI shared storage. Paul also gave us copy of the tests he would run, which let vendors select and tune their configurations. Each vendor would get a 2- or 3-day timeslot on-site in Hawaii for Paul to run the tests himself, plus play around with the system's management features. HP was scheduled for the first week of March.
In late January we got the OK to bring pre-released equipment. Luckily for Paul, Dell, IBM, and HP all brought similar 2-socket server blades with then unannounced Intel Xeon 5670 processors ("Westmere"). We scrambled to come up with the CPUs themselves; at the time, HP's limited stocks were all in use to support Intel's March announcement.
HP's final config: One c7000 enclosure, four ProLiant BL460c G6 server blades with VMWare ESX and using 6-core Xeon processors and 8GB LVDIMMs. Two additional BL460c G6's with StorageWorks SB40c storage blades for shared storage; a Virtual Connect Flex-10 module, and a 4Gb fibre switch. (We also had a 1U KVM console and an external MSA2000 storage array just in case, but ended up not using them.)
To show off some power-reducing technology, we used solid state drives in the storage blades, and low-voltage memory in the server nodes. HP recently added these Samsung-made "Green" DDR3 DIMMs that use 2Gb-based DRAMS built with 40nm technology. LV DIMMs can run at 1.35 volts (versus the normal 1.5 volts), so that they "ditch the unnecessary energy drain" (as Samsung's Sylvie Kadivar put it recently).
Our pre-built system left Houston three days before I did, but it still wasn't there when I landed in Honolulu Sunday afternoon. We had inadvertently put the enclosure into an extra-large Keal case (a hard-walled shipping container) which was too tall to fit in some aircraft. It apparently didn't fit the first cargo flight. Or the second one. Or the third one...
Sunday evening, already stressed about our missing equipment, the four of us from HP met in the home of our Hawaiian host, Brian Chee of the University of Hawaii's Advanced Network Computing Laboratory. Our dinnertime conversation generated additional stress: We realized that I'd mis-read the lab's specs, and we'd built our c7000 enclosure with 3-phase power inputs that didn't match the lab's PDUs. Crud.
We nevertheless headed to the lab on Monday, where we spotted the rats-nest of cables intended to connect power meters to the equipment. Since our servers still hadn't arrived, two of the HP guys fetched parts from a nearby Home Depot, then built new junction boxes that would both handle the "plug conversion" to the power whips, plus provide permanent (and much safer) test points for power measurements.
Meanwhile, we let Paul get a true remote management experience on BladeSystem. I VPN'd into HP's corporate network, and pointed a browser to the Onboard Administrator of an enclosure back in a Houston lab. Even with Firefox (Paul's choice of browser), controlling an enclosure that's 3000 miles distant is still simple.
Mid-morning on day #2, Paul got a cell call from the lost delivery truck driver. After chasing him down on foot, we hauled the shipping case on to the truck's hydraulic lift...which suddenly lurched under the heavy weight, spilling the wheels off of the side, and nearly sending the whole thing crashing onto the ground. It still took a nasty jolt.
Some pushing and shoving got the gear to the Geophysics building's piston-driven, hydraulic elevator, then up to the 5th floor. (I suppose I wouldn't want to be on that elevator when the "Low Oil" light turns on!)
We unpacked and powered up the chassis, but immediately noticed a health warning light on one blades. We quickly spotted the problem; a DIMM had popped partway out. Perhaps not coincidently, it was the blade that took the greatest shock when the shipping container had slipped from the lift.
With everything running (whew), Paul left the lab for his "control station", an Ubuntu-powered notebook in an adjoining room. Just as he sat down to start deploying CentOS images to some of the blades...wham, internet access for the whole campus blinked out. It didn't affect the testing itself, but it caused other network problems in the lab.
An hour later, those problems were solved, and performance tests were underway. It went quick. Next, some network bandwidth tests. Paul even found some the time to run some timed tests to evaluate Intel's new AES-NI instructions, using timed test with some OpenSSL tools.
Day #3 brought us a new problem. HP's Onboard Administrator records actual power use, but Paul wanted independent confirmation of the numbers. (Hence, the power meters and junction box test-points.) But the lab's meters couldn't handle redundant three-phase connections. An hour of reconfiguration and recalculation later, we found a way to corroborate measurements. (In the end, I don't think Paul published power numbers, though he may have factored them into his ratings.)
We rapidly re-packed the equipment at midday on day #3 so that IBM could move into the lab. Paul was already drafting his article as we said "Aloha", and headed for the beach -- err, I mean, back to the office.
Here's my count-down of the top technologies that will have the most impact on servers in 2010.
10. DDR3L - The JEDEC spec for low-voltage DDR-3 memory came out last year, but 2010 should mark significant adoption of these 1.35-volt DIMMs. Since the memory in a modern, high-memory server can consume more power than the processors, DDR3L will play a key role in helping solve data center power consumption and capacity problems.
9. Oracle Fusion Applications - Currently in beta testing, Oracle Fusion Apps is an evolutionary step in Oracle's piecing together of key technologies from its "standard" products with those it recently acquired, like PeopleSoft and Siebel. In some cases, I expect we'll be learning (and managing) applications that are effectively brand-new.
8. Tukwila and Power7 - The UNIX-oriented mission-critical processors grow beyond dual-core, and get hefty caches shared between cores. Intel expects to bring its Itanium into production in the first part of 2010, while published roadmaps from IBM also put Power7 in the 2010 timeframe.
7. RHEL6 - I haven't seen schedules from Red Hat showing RH Enterprise Linux futures, but based on their plan to move RHEL5 into "phase 2" of their lifecycle in early 2011 (that's basically the "no new features, just bug fixes" phase), 2010 would be the logical year for this virtualization-tuned generation of the OS. Fedora 11 and 12 (now released) were the planned "feature previews" for RHEL6, so we'll see.
6. SPEC virtualization benchmark - I'm making another guess at roadmaps to predict the SPEC Virtualization committee might reveal its plans for a benchmark in 2010. (HP is a committee member, though I'm not personally involved in that; as always on this blog, I'm speaking for myself and not for HP.) VMMark is a great tool, but the SPEC benchmark should boost our ability to do vendor-agnostic comparisons of virtualization systems.
5. SAS SSDs - Solid state drives with a SATA interface have been available for a couple of years in servers. (I think IBM was the first to use them as internal drives on blades.) However, servers have traditionally relied on performance & reliability advantages that the SAS protocol brings, and so SAS SSDs are really going to help bring SSDs into everyday use inside servers.
4. Nehalem-EX - The benefits of an integrated memory controller and hyper-threading that emerged with the Intel Xeon 5000 processor will be available to servers with more than 2 processors. Plus, with bigger cache and a beefier memory subsystem, performance will be impressive -- Intel says Nehalem-EX will bring the "largest performance leap in Xeon history".
3. CEE 'ratified' - Converged Enhanced Ethernet (CEE) is the final piece to enable a standardized Fibre-Channel over Ethernet (FCoE). This carries the possibility of effectively eliminating an entire fabric from data centers, so there's much-anticipated cost savings and flexibility boosts. Actually, there is no single "CEE" standard; but the key final pieces (the 802.1Qbb and 802.1az standards from IEEE) are targeted for final ratification around mid-2010.
2. Win2003 Server End of Mainstream Support - There are really only two reasons to upgrade an OS: You want some new feature, or the old one can't be patched. For those who are relying on Windows 2003, the chance of the latter happening is about to get larger in 2010, so expect a lot more pressure to upgrade older systems to Server 2008.
1. Magny-Cours processor - Twelve-core x86 processors; enough said. Actually, maybe not: AMD's next-gen Opteron has other performance-boosting features (like additional memory channels), and Magny-Cours will be available for 2-processor as well as 4+ processor servers at the same time. What else? I'm impressed with John Fruehe's comments about AMD's plans to enable 4P performance with 2P economics. I predict Magny-Cours will be the big story in 2010.
Top-ten lists don't seem complete without honorable mentions, so here are my two: Ratification of the PCI Express 3.0 spec, and Microsoft's Madison / Parallel Data Warehouse extension of its SQL server line.
And finally, one new product that almost, but thankfully didn't, appear on this list: The 0.0635 meter hard drive. The EU's Metric Directive , which comes into effect in 2010, originally prohibited publishing specs in anything but metric units. Among other things, that could have lead to a renaming of 2.5-inch and 3.5-inch hard drives. Luckily, later modifications to the EU rules mean the "0.0636 meter drive" won't make its appearance -- at least in 2010.
Last week at IDF, two Intel technologists spoke about different fixes to the problem of compute capacity outpacing the typical server's ability to handle it.
For the past 5 years, x86 CPU makers have boosted performance by adding more cores within the processor. That's enabled servers with ever-increasing CPU horsepower. RK Hiremane (speaking on "I/O Innovations for the Enterprise Cloud") says that that I/O subsystems haven't kept pace with this processor capacity, moving the bottleneck for most applications from the CPU to the network and storage subsystems.
He gives the example of virtualized workloads. Quad-core processors can support the compute demands for a bunch of virtual machines. However, the typical server I/O subsystem (based on 1Gb Ethernet and SAS hard drives) gets overburdened by the I/O demand of all those virtual machines. He predicts an immindent evolution (or revolution) in server I/O to fix this problem.
Among other things, he suggests solid-state drives (SSDs) and 10 gigabit Ethernet will be elements of that (r)evolution. So will new virtualization techniques for network devices. (BTW, some of the changes he predicts are already being adopted on ProLiant server blades, like embedded 10GbE controllers with "carvable" Flex-10 NICs. Others, like solid-state drives, are now being widely adopted by many server makers.)
Hold on, said Anwar Ghuloum. The revolution that's needed is actually in programming, not hardware. There are still processor bottlenecks holding back performance; they stem from not making the shift in software to parallelism that x86 multi-core requires.
He cites five challenges to mastering parallel programming for x86 multi-core:
* Learning Curve (programmer skill sets)
* Readability (ability for one programmer to read & maintain other programmer's parallel code)
* Correctness (ability to prove a parallel algorithm generates the right results)
* Scalability (ability to scale beyond 2 and 4 cores to 16+)
* Portability (ability to run code on multiple processor families)
Anwar showed off one upcoming C++ library called Ct from RapidMind (now part of Intel) that's being built to help programmers solve these challenges. (Intel has a Beta program for this software, if you're interested.)
To me, it's obvious that the "solution" is a mix of both. Server I/O subsystems must (and are) improving, and ISVs are getting better at porting applications to scale with core count.
I ran across this website and a great tech brief about the good and bad of SSD technology so you can make up your own mind and decide if SSD is right for you.