Over the last year or so, I’ve had the opportunity to speak with a number of customers about the emerging Fibre Channel over Ethernet standard. The idea of converging Ethernet and Fibre Channel sounds instantly appealing, but the realities of large scale implementations have many caveats with them. Last week Gartner published a paper on FCoE titled “Myth: Single FCoE Data Center Network = Fewer Ports, Less Complexity and Lower Costs”.
Given the release of this paper, I thought it would be a good time for me to share some thoughts on FCoE as well. From my experience, not all customers are aware of some important considerations when evaluating deployments of FCoE. In no particular order.,.
1. While FCoE can converge ports at the server edge, the total bandwidth remains the same. So as you go to aggregation and core, the number of ports required to run 8Gb FC or 10Gb Ethernet will likely remain roughly the same or greater. Replacing relatively inexpensive lossy Ethernet switches with 10Gb lossless ones may actually increase costs substantially.
2. FCoE is literally Fibre Channel running on top of enhanced Ethernet, so both protocols need to be managed. This may reduce expected management savings.
3. For IT organizations with both a LAN and SAN team, FCoE will likely add a new dependence of the SAN team onto the LAN team. This could slow down IT change events as more teams are involved in routine provisioning and maintenance operations.
4. FCoE requires lossless Ethernet to operate properly. Most switches deployed in customer data centers today lack the hardware to support this. As a result, implementing FCoE will typically require a large-scale replacement of switch hardware. Most SAN arrays do not support FCoE natively and would also need to be replaced to support end-to-end FCoE.
5. The lossless Ethernet (DCB) that FCoE is dependent on is not yet ratified by the IEEE. One new protocol for congestion notification across multiple hops (QCN) requires new silicon to fully implement. Even the newer FCoE enabled switches lack the necessary silicon hardware to support QCN, so full implementations will require next generation switches.
FCoE is an interesting concept, but multi-hop FCoE is a bit premature. The standards that take this from a concept to a reality are being fleshed out as we speak by the IEEE DCB Workgroup. For more information on the status of DCB, see: http://www.ieee802.org/1/pages/dcbridges.html.
How does the application you are using and what it is doing affect the power consumption of system.
The first thing that everyone looks at when talking about power consumption is CPU utilization. Unfortunately CPU utilization is not a good proxy for power consumption and the reason why goes right down to the instruction level. Modern CPUs like the Intel Nehalem and AMD Istanbul processors have 100s of millions of transistors on the die. What really drives power consumption is how many of those transistors are actually active. At the most basic level an instruction will activate a number of transistors on the CPU, depending on what the instruction is actually doing a different number of transistors will be activated. So a simple register add, for example, might integer add the values in two registers and place the result in a third register. A relatively small number of transistors will be active during this sequence. The opposite would be a complex instruction that streams data from memory to the cache and feeds it to the floating point unit activating millions of transistors simultaneously.
Further to this modern CPU architectures allow some instruction level parallelization so you can, if the code sequence supports it, run multiple operations simultaneously. Then on top of that we have multiple threads and multiple cores. So depending on how the code is written you can have a single linear sequence of instructions running or multiple parallel streams running on multiple ALUs and FPUs in the processor simultaneously
Add to that the fact that in modern CPUs the power load drops dramatically when the CPU is not actively working, idle circuitry in the CPU is placed in sleep modes, standby or switched off to reduce power consumption. So if you're not running any floating point code, for example, huge numbers of transistors are not active and not consuming much power.
This means that application power utilization varies depending on what the application is actually doing and how it is written. Therefore depending on the application you run you will see massively different power consumption even if they all report 100% CPU utilization. You can even see differences running the same benchmark depending on which compiler is used and whether the benchmark was optimized for a specific platform or not and the exact instruction sequence that is run.
The data in graph below shows the relative power consumption of an HP BladeSystem c7000 Enclosure with 32 BL2x220c Servers. We ran a bunch of applications and also had a couple of customers with the same configuration who wre able to give us power measurements off their enclosures. One key thing to note is that the CPU was pegged at 100% for all of these tests, (except the idle measurement obviously).
As you can see there is a significant difference between idle and the highest power application, Linpack running across 8 cores in each blade. Another point to look at is that two customer applications, Rendering and Monte Carlo, don't get anywhere close to the Prime95 and Linpack benchmarks in terms of power consumption.
It is therefore impossible to say what is the power consumption of server X and comparing it to server Y unless they are both running the same application under the same conditions. This why both SPEC and the TPC have been developing power consumption benchmarks that look at both the workload and power consumption to give an comparable value between different systems.
SPEC in fact just added Power Consumption metrics to the new SPECweb2009 and interesting enoughly the two results that are up there have the same performance per watt number, but they have wildy different configurations, absolute performance numbers and absolute wattage numbers. So there's more to performance per watt than meets the eye.
The first part of this series was Configuration Matters
We recently learned that Aaron Delp closed down his BladeVault blog and is focusing on creating more useful infomation to share with the greater community by contributing to Scott Lowe's blog. For those of you that don't know Aaron, he's a senior engineer who is literally on the front lines of the blade and virtualization revolution. No, he doesn't work for HP or IBM. But he does know just about everything there is to know about us both. The good, the bad and the ugly.
We not only like Aaron because he's a smart guy who shoots it straight, but also because he likes to share what he knows with the community. Like I said, he knows a lot.
Well, here's what he's up to now. In a series titled "Blades and Virtualization Aren't Mutually Exclusive", Aaron is sharing a ton of personal research and experience with blades. In his first two articles in the series, he takes an insider look at the power advantages of blade versus rack servers - looking at both HP and IBM. I know we told you blades use a lot less power before, but you still think we're full of crap. Fine. Take it from Aaron.
In his next article, Aaron promised to focus on the expansion abilities of both the IBM and HP blade servers. We'll be reading and linking his thoughts here.