Mission Critical Computing Blog
Your source for the latest insights on HP Integrity, mission critical computing, and other relevant server and technology topics from the BCS team.

Measuring Uptime

How do you measure uptime? Is it becoming more important in your environment? Is downtime costing your company more or less today when compared to a  few years ago?

For many customers, the amount of downtime that they experience is increasing, often due to the complexity of new systems. In addition, the cost of downtime is also increasing, usually due to the businesses increased reliance on IT systems. In short, for many customers, downtime is a bigger issue than in the past.


Coming from a business that spends a lot of time working with mission-critical customers, I've seen some interesting changes over the past few years, especially where uptime measurement is concerned.


I've seen that with virtualization, many workloads that each have lower uptime requirements are consolidated onto fewer platforms. Often, this means that the uptime requirements for platform actually increase compared to the individual workloads. However, virtualization also provides benefits, such as moving workloads online which allows maintenance to be completed without bringing down the application -  a great way to reduce planned downtime.


I've also noticed that as systems get more complex, and vendors build in more availability into the applications, that the overall uptime of the application increases. However, the uptime of an individual node in a cluster may not be as high as a single node of the application. Why? Because the increased complexity of the cluster results in higher overall availability, but at times it sacrifices the ease of management, configuration, and maintenance that may be available in a single node version, resulting in more downtime.


So, how do you measure uptime in your environment? Do you measure it based on the uptime of the server? Does that change if you can move a virtual machine workload from one system to another to handle planned downtime?


Do you measure uptime based on the OS availability? I can move my virtualized workload from one server to another, and the OS stays running. This is wonderful, and definitely helps reduce planned downtime. If you are running a cluster of virtual machines, and the clustering only measures whether a server is running (for unplanned downtime) or if the administrator needs to manually start an online migration (for planned downtime), it is hard to get OS level availability or application level availability measurements.


Do you measure uptime based on the application availability? This is easy in a clustered environment when the cluster understand the applications, such as with HP Serviceguard . While this works well for mission critical applications, it does take some effort to get that level of application integration. And then, how do you measure uptime on a multi-node solution, such as Oracle RAC? Do you measure the uptime of each node, any of the nodes, or all of the nodes?


So, how do you measure the uptime of your environment, or do you use different measurements for different systems or parts of your environment? How do you navigate vendor uptime claims, especially since different solutions may offer similar claims (ex. 99.9% uptime), but often measure different things (ex. Application uptime versus physical server or virtual machine uptime)? Do your uptime measurements include  planned downtime for maintenance, or just unplanned downtime? Comments or thoughts on how this plays out in the real world are always appreciated.



Are you paying all year for a holiday spike in traffic?


Over the years, I've met with many customers who have spikes in their holiday traffic. I've spoken with a southern hemisphere beverage company, who has a huge spike in orders the last Monday morning before Christmas. I've spoken with a customer who's busiest day of the year is the final Friday before Christmas. I've spoken to numerous retailers who have their busiest shopping days at this time of the year. Often, these spikes are 10 times or more higher than the average demand.


How do these customers adapt to the high levels of seasonal demand? The first, and most obvious technical way is to provision their systems to handle the peak demand. Of course, that means they are paying for excess capacity for the rest of the year. Having said that, they meet their business requirements, customers are happy, and the IT department keeps their jobs.


The alternative of reducing the peak size of the systems, so that they can't handle all the demand, will save a little money on the IT budget. However, every year, there are IT infrastructures that get a surge of demand that they weren't designed to handle, and the company ends up losing customers, their reputation, and a lot more money than the extra capacity would have cost in the first place.


Having said that, more and more customers are looking at this environment, and with reduced budgets, they want the best of both worlds. Customers need to handle their peak capacity, but also take advantage of lower costs. At the end of the day, there are two ways that virtualization can help in this situation.


First, and perhaps the easiest way, its to take advantage of some sort of flexible financing so that you only pay for additional capacity when you actually need it.  This is the idea behind offerings such as Instant Capacity and Temporary Instant Capacity on HP Integrity servers all the way to truly flexible cloud computing offerings such as Amazon EC2.


The second way is to run additional workloads on the systems to use up the extra capacity. This works well, as long as those additional workloads can be released to provide resources for the primary workload when the demand spikes come along. Dynamic hard partitions (nPars), dynamic vPars, virtual machine, and application stacking technologies all make this possible . Freeing up resources can be everything from manually shutting down low priority workloads to automatically shifting resources between partitions to migrating workloads off of a system. I've even come across some unique ways of tackling this problem:

  • locking down the environment for a few months, and shutting off all development and test systems;

  • running on a single node of Oracle RAC for most of the year and expanding to multiple nodes for the holiday rush;

  • migrating production workloads to larger or dedicated systems for a period of time

  • and more.

The good news is that virtualization technologies, such as Insight Dynamics - VSE , whether on HP Integrity servers, HP ProLiant servers, or HP BladeSystem, create an environment where this is not only possible, but relatively easy to do.


Actually, these customers have it relatively easy. They know that they will have a holiday spike. They even can generate a reasonably accurate estimate of the workload that their systems will see on those days. They can plan to lock down their environment in advance to free up test or development systems. They can manually resize partitions days or weeks in advance. And since the holiday season is reasonably predictable, they can make there plans well in advance.


The nice thing about Insight Dynamics - VSE for HP Integrity is that while it makes it easier to handle the predicted fluctuations, it actually excels in handling the unpredictable spikes and troughs in demand equally well. Since it is automated, tools like the HP Global Workload Manager component in Insight Dynamics - VSE for Integrity can observe and react to changes in the environment in seconds-  not minutes or hours. It automates the rest of the portfolio, including the partitioning, clustering, and instant capacity products to automatically react to changing workloads.


At the end of the day, automation of a flexible environment provides the best of both worlds - high levels of utilization (and therefore lower total cost of ownership), but with the ability to handle peak workloads - whether predictable peaks like the holidays, or an unpredictable peak. The best of both worlds - and a less stressful holiday season for all those who work in IT.



Showing results for 
Search instead for 
Do you mean 
Follow Us

About the Author(s)
  • • Responsible for product management and marketing of NonStop Database, Business Continuity, and Cloud portfolios. Define product line strategy, positioning, branding, and messaging for all products in my portfolio. • Lead the Business Development efforts to build strategic partnerships to strengthen the eco-system. • Lead the GTM around Big Data with new innovative Analytics solutions resulting in incremental revenue opportunities. • Lead product marketing efforts including strategic positioning, Go-to-Market strategy, Sales Enablement and Analyst Briefing.
  • I work as a Master Architect in HP Servers R & D group. I work with teams spread across the lab and outside to build solutions which are highly available on HP-UX, OpenVMS and Mission Critical Linux platforms. In particular I contribute to develop HP Serviceguard clusters, HP-UX Security and Middleware products. I have been with HP for last 17 years and have exposure to HA/DR field from both R & D and customer perspectives.
  • Kirk Bresniker is the Vice President/Chief Technologist for HP Business Critical Systems where he has technical responsibility for all things Mission Critical, including HP-UX, NonStop and scalable x86 platforms. He joined HP in 1989 after graduating from Santa Clara University and has been an HP Fellow since 2008.
  • I’m the worldwide marketing manager for HP NonStop. I’ll be blogging and tweeting out news as it relates to NonStop solutions – you can find me here and on twitter at @CarolynatHP
  • Cynthia is part of the HP ExpertOne team. ExpertOne offers professional IT training and certifications from infrastructure refresh to areas that span across the datacenter like Cloud and Converged Infrastructure.
  • Hi, I´m part of the HP Servers team and work as Product Marketing Manager with a focus on mission-critical offerings. I´m interested in the business value our technology brings to customers and I'll be blogging about their stories and how our portfolio evolves to help them succeed in today´s market.
  • I have worked with NonStop systems since 1982. I am a Master Technologist for HP and am part of the IT SWAT organization, the Cloud SWAT and work with HP Labs. I report into the Enterprise Solutions and Architecture organization.
  • Joe Androlowicz is a Technical Communications and Marketing manager in HP’s NonStop Product Division. Joe is a 25 year journeyman in information systems design, instructional technologies and multimedia development. He left Apple Computer for Tandem Computers to help launch G03 and hasn’t looked back yet. He previously managed the program management team for the NonStop Education and Training Center and drove the development and growth of the NonStop Certification programs.
  • HP Servers, Converged Infrastructure, Converged Systems and ExpertOne
  • Luke Oda is a member of the HP's BCS Marketing team. With a primary focus on marketing programs that support HP's BCS portfolio. His interests include all things mission-critical and the continuing innovation that HP demonstrates across the globe.
  • I am the Superdome 2 Product Manager. My interest is to learn how mission critical platform helps customers and would also like to share my thoughts on how Superdome has been helping customers and will continue to do so.
  • I work in the HP Servers marketing group, managing a marketing team responsible for marketing solutions for enterprise customers who run mission-critical workloads and depend on HP to keep their business continuously running.
  • Mohan Parthasarathy is a Technical Architect in the HP-UX lab. His primary focus currently is in the core kernel, platform enablement and virtualization areas of HP-UX. Mohan has worked on various modules of HP-UX, including networking protocol stacks, drivers, core kernel and virtualization
  • I’ll be blogging about the latest news and enhancements as it relates to HP Moonshot.
  • Greetings! I am on the HP Enterprise Group marketing team. Topics I am interested in include Converged Infrastructure, Converged Systems and Management, and HP BladeSystem.
  • As a Managing Consultant for HP’s Enterprise Solution & Architecture group, I collaborate with client business and IT senior management to understand, prioritize and architect advanced use of data and information, drawing insights required to make informed business decisions. My current focus leverages event-driven business intelligence design techniques and technologies to identify patterns, anticipate outcomes and proactively optimize business response creating a differentiated position in the marketplace for the client.
  • Wendy Bartlett is a Distinguished Technologist in HP’s NonStop Enterprise Division, and focuses on dependability – security and availability - for the NonStop server line. She joined Tandem in 1978. Her other main area of interest is system architecture evolution. She has an M.S. degree in computer science from Stanford University.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.