By Chuck Klein
Now it was time for the bloggers to head to the Insight Software lab to see what HP does for managing data centers for power & cooling. John Schmitz, Ute Albert, and Tom Turicchi went over the System Insight Manager software (SIM) all the way up the management stack to Insight Dynamics. This is the software stack that allows system administrators to install, configure, monitor, and plan for BladeSystem chassis in the Datacenter.
Tom then gave a demonstration of how the Data Center Power Control part of Insight Control allows for data center managers to plan, monitor, cap and control the amount of energy and cooling is used by their infrastructure. Tom set up policies and rules to manage events that may happen in a Data Center from utility brown-outs to loss of cooling units. He also went over how you can monitor energy usage for the Data Center all the way down to each blade. This would allow you to better plan for capacity and where to install new blades.
The attendees wanted to know what couldn't be managed as they thought the list would be much shorter than reviewing what the software could do. So Tom went over that it managed only HP servers presently, that scripts could be used to manage or shutdown multi-tiered applications, network devices, and storage. These devices did not have the iLO2 ASIC chip in them and that was a foundational element that needed to be there.
Tom also went over a demo of what could be done to setup the event manager to respond to utility policies and help save companies money. He used an example from PG&E in California. That's all for now.
How does the application you are using and what it is doing affect the power consumption of system.
The first thing that everyone looks at when talking about power consumption is CPU utilization. Unfortunately CPU utilization is not a good proxy for power consumption and the reason why goes right down to the instruction level. Modern CPUs like the Intel Nehalem and AMD Istanbul processors have 100s of millions of transistors on the die. What really drives power consumption is how many of those transistors are actually active. At the most basic level an instruction will activate a number of transistors on the CPU, depending on what the instruction is actually doing a different number of transistors will be activated. So a simple register add, for example, might integer add the values in two registers and place the result in a third register. A relatively small number of transistors will be active during this sequence. The opposite would be a complex instruction that streams data from memory to the cache and feeds it to the floating point unit activating millions of transistors simultaneously.
Further to this modern CPU architectures allow some instruction level parallelization so you can, if the code sequence supports it, run multiple operations simultaneously. Then on top of that we have multiple threads and multiple cores. So depending on how the code is written you can have a single linear sequence of instructions running or multiple parallel streams running on multiple ALUs and FPUs in the processor simultaneously
Add to that the fact that in modern CPUs the power load drops dramatically when the CPU is not actively working, idle circuitry in the CPU is placed in sleep modes, standby or switched off to reduce power consumption. So if you're not running any floating point code, for example, huge numbers of transistors are not active and not consuming much power.
This means that application power utilization varies depending on what the application is actually doing and how it is written. Therefore depending on the application you run you will see massively different power consumption even if they all report 100% CPU utilization. You can even see differences running the same benchmark depending on which compiler is used and whether the benchmark was optimized for a specific platform or not and the exact instruction sequence that is run.
The data in graph below shows the relative power consumption of an HP BladeSystem c7000 Enclosure with 32 BL2x220c Servers. We ran a bunch of applications and also had a couple of customers with the same configuration who wre able to give us power measurements off their enclosures. One key thing to note is that the CPU was pegged at 100% for all of these tests, (except the idle measurement obviously).
As you can see there is a significant difference between idle and the highest power application, Linpack running across 8 cores in each blade. Another point to look at is that two customer applications, Rendering and Monte Carlo, don't get anywhere close to the Prime95 and Linpack benchmarks in terms of power consumption.
It is therefore impossible to say what is the power consumption of server X and comparing it to server Y unless they are both running the same application under the same conditions. This why both SPEC and the TPC have been developing power consumption benchmarks that look at both the workload and power consumption to give an comparable value between different systems.
SPEC in fact just added Power Consumption metrics to the new SPECweb2009 and interesting enoughly the two results that are up there have the same performance per watt number, but they have wildy different configurations, absolute performance numbers and absolute wattage numbers. So there's more to performance per watt than meets the eye.
The first part of this series was Configuration Matters
Mike Manos responded to my post about power capping being
ready for prime time with a very well thought out and argued post that really
looks at this from a datacenter manager's perspective, rather than just my
technology focused perspective.
I'm going to try and summarize some of the key issues that
he brings up and try to respond as best I can.
This one spans a number points that Mike brings up, but I think the key thing here is that you
must have a critical mass of devices in the datacenter that support power
capping otherwise there is no compelling value.
I don't believe it is necessary, however, to have 100% of devices in the
datacenter that support power capping. There
are two reasons why:
In most Enterprise datacenters the vast majority
of the power for the IT load is going to the servers. I've seen numbers around 66% servers, 22%
storage and 12% networking. This is a
limited sample so if you have other numbers let me know I would be interested.
Most of the power variation comes from the
server load. A server at full load can use 2x - 3x the power of a server at
idle. Network switch load variation is
minimal based on some quick Web research (see Extreme Networks power consumption test or Miercom power consumption testing). Storage power consumption variation also seems to
fairly light at no more than 30% more than idle. See Power Provisioning for a
Warehouse-sized Computer by Google
So if our Datacenter manager, Howard, can power cap the
servers then he's got control of the largest and most variable chunk of IT
power. Would he like to have control of
everything, absolutely yes, but being able to control the servers is more than
half of the problem.
Been there done that,
got the T-Shirt
The other thing that we get told by the many Howards that
are out there is that they're stuck.
They've been round and round the loop Mike describes and they've hit the
wall. They don't dare decrease the
budgeted power per server any more as they have to allow for the fact the
servers could spike up in load, and if that blows a breaker taking down a rack
then all hell is going to break lose.
With a server power cap in place Howard can safely drop the budgeted
power per server and fit more into his existing datacenter. Will this cost him, sure, both time to
install and configure and money for the licenses to enable the feature. But I
guarantee you that when you compare this to cost of new datacenter facilities
or leasing space in another DC this will be trivial.
I agree most datacenters are in fact heterogeneous at the
server level either; they will have a mix of server generations and
manufacturers. This again comes down to
critical mass, so what we did was enable this feature on the two of the best
selling servers of the previous generation, DL360 G5 and DL380 G5 and pretty
much all of the BladeSystem blades to help create that critical mass of servers
that are already out there, then add on with the new G6 servers. We would of course love for everyone with
other manufacturer's product to upgrade immediately to HP G6 ProLiant Servers
and Blades, but it's probably not going to happen. This will delay the point at which power
capping can be enabled and for those customers that use other vendors systems
they may not be able to enable power capping until those vendors support it.
Power Cap Management
There's a bunch of issues around power cap management that definitely
do need to get sorted out. The HP
products do come from an IT perspective and they are not the same tools that facilities
managers typically use. Clearly there
needs to be some kind convergence between these two toolsets even if it's just
the ability to transfer data between them.
Wouldn't it be great if something like the Systems Insight Manager/Insight
Power Manager combination that collects power and server data could feed into something
like say Aperture (http://www.aperture.com/)
then you'd have the same information in both sets of tools.
The other question that we have had from customers is who
owns and therefore can change the power cap on the server, the
facility/datacenter team or IT Server Admin team. This is more of a political question than
anything else, and I don't have a simple answer, but if you are really using
power caps to their full potential changing the power cap on a server is
something that both teams will need to be involved in.
I would like to know what are the other barriers you see to
implementing power capping - let me know in the comments and be assured that
your feedback is going into the development teams.
Just to make Mike happy I thought I'd let you know that we
do have SNMP access to the enclosure power consumption.
If you collect all six SNMP MIB power supply current output
power values and add them together, you will have calculated the Enclosure
In the CPQRACK.MIB file, which you can get from here http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?swItem=MTX-a7f532d82b3847188d6a7fc60b&lang=en&cc=us&mode=3&
There are some values
cpqRackPowerSupplyCurPwrOutput which is MIB item:
enterprises.220.127.116.11.18.104.22.168.1 through enterprises.22.214.171.124.126.96.36.199.6
gives you the Input Power of each Power Supply, I know the MIB name says output
but it's actually input - sum these together then you have the Enclosure Input Power.
Power supplies placed in standby for Dynamic Power Savings
will be reporting 0 watts.
And for enclosure ambient temp - read:
I was going to add some more details to Chuck's post on how a blade server powers on, but I got sidetracked by a brilliant post from Mike Manos of Digital Realty on the real basics of what is going with power in your datacenter.
What Mike is explaining, far better than I could, is how power gets used up and reserved in your datacenter by breaker sizes, redundancy and natural tendency of the facility management to be conservative when allocating power to servers, and as he says they have good reason to be. If they plug in a device that causes a breaker to trip taking down multiple servers - it's their butts that are on the line.
He raised a good question about why the faceplate label, the label on power supply that indicates the max power input, is so high that most facilities managers are comfortable de-rating it by 20% - 30%. Well the reason is explained in part by my post on how configuration affects power consumption; the power supply is designed to deal with maximum configured load. The range from a minimum configured load for a 2 socket server e.g. 1 Low Power CPU, 1 or 2 DIMMs, 1 x SSD drive and no PCI cards, to a maximum configured load e.g. 2 x 120W or 130W CPUs, 12 or 18 DIMMs, 8 x 15K RPM Drives, 3 x PCI Cards including a 200W graphics card, is huge and that’s just one server. The example I use in the Configuration Matters post shows a difference of over 1kW across an enclosure. Talk to any power supply designer and you'll find out that they are just as conservative as any facility manager (and unappreciated) and for pretty much the same reasons. Who gets blamed when you run a high power program like Prime95 or Linpack and the server shuts down because the power supply couldn’t deliver enough juice.
That’s why HP came up with the common slot power supply design for rack mount servers. It allows you to size the power supply for the actual configuration you will be using rather than just stuffing a 1200W power supply in every server.
This has two great consequences:
It reduces the amount of trapped or stranded power by reducing the amount the power that the facility manager has to allocate to a given server.
It increases your power supply efficiency, reducing energy wasted. All power supplies have an efficiency curve that for servers at low outputs has a low efficiency and gets to peak efficiency at about 35% - 50% 65% load (Got corrected by one of the engineering team on this. Must remember in future to check my numbers). Remember most servers have redundant power supplies and in the HP case they load share so the PSU can only ever exceed 50% load in the event of a redundancy failure.
This does add complexity to your buying decision, now you have to pick the power supply you need based on your configuration. That's why we created the HP Power Advisor to help with that decision. Of course you can still just use a 750W or 1200W PSU for every server if you want to, but you won't be running as efficiently as you could.
One area though where I must respectfully disagree with Mike is in his comments on Power Capping. I agree that is a technology that has huge potential in the datacenter to allow your facilities team to recover that trapped capacity, but I disagree that it is not ready for prime-time.
HP delivered our first version of power capping in 2007. This was relatively slow acting and was really only good for controlling the average power consumption of a server. This was great if you had a cooling issue on your datacenter and wanted to control the heat output of your servers as heat is largely related to average power of the server, but you couldn’t use it to protect circuit breakers.
In November 2008 HP introduced Dynamic Power Capping with circuit breaker protection. This is a hardware based solution that can respond to changes in power consumption in less than 500ms and because it’s a hardware solution it’s operating system and application independent. This is supported on all G6 servers, most blade servers and selected G5 rack-mount servers. When run on an HP Blade Enclosure you gain additional capabilities; the Onboard Administrator can manage the blade server caps to optimize the performance of the enclosure. It will change the blade level power caps so that busier blades get more power and less busy blades will get less power while maintaining the enclosure level power cap so you can protect your breakers.
For a demonstration of this on the rack mount servers showing how we deliver circuit breaker protection see this video with “Professor” Alan Goodrum and for more information Dynamic Power Capping go to http://www.hp.com/go/powercapping
Some of our Blade Specialists give us the workings of the BladeSystem "Power switch" . David gives us these words of wisdom.
"I've always referred to the "power switch" on the front of the blade not as a power switch but a " may I?" switch. When actuated, the blade asks the OA if there is enough available power to power on and spin up the drives (the act of powering on takes a higher load than steady state operation). If at that moment there is not enough power, the blade pauses and tries again in a moment (there's an algorithm in there that determines the actual length of the pause so that if several blades had asked for power at the same time, they don't all use the same delays and end up in deadlock). If after requesting power several times (sorry, don't know how many) there still isn't enough power, it will stop trying. If a blade has not been successfully turned on, the command can be issued again.
Net/net: the blade infrastructure is designed to prevent more power demand than is available. The illustration above is equally valid if power-on is initiated from power switch, OA/iLO command, or via Wake-on-LAN..."
And Tony added in his experience.
"David is correct but there are some subtleties about how the server will respond. On a cold boot - insertion, enclosure power-up, etc. if Automatic Power-On is set, then the server will retry power on requests if they get denied.
Power-On commands like the power button or iLO Virtual Power/OA commands are one time things - if the request fails it will not be retried. Therefore as David said the application needs to be able to detect if a server has powered on after the request is issued."
Hope this helps.