This blog is the third part of the description of the delivery layer within the HP Cloud Functional Reference Architecture. Remember, the demand layer focused on the user, the delivery layer is focused on the service and its service elements. We first discussed how a service is provisioned, we then discussed how the service is accessed and options are changed. Let’s now discuss how we monitor and measure the service, and how we ensure proper quality of service.
To measure usage, identify issues or assess quality of service, we first need to gather data on how the service is performing; that is the purpose of service monitoring. That data is gathered with two mechanisms:
- Information associated with the resources on which the service or service instance runs are provided by the supply layer through the usage metering function. For each resource used, relevant information is collected there and transmitted to the service monitoring function
- Information associated with what runs on the resource is provided by tools available to the delivery layer. Most often those tools consist in agents that are embedded in the virtual machine during provisioning. Those agents collect data and make that data available to service monitoring. The data depends on the agent used.
Things become more interesting when services are intermediated or aggregated:
- In case of intermediation, the cloud platform and its delivery layer can rely on the service provider to deliver the performance and metering data. But is the service provider making that data available and is he doing it at a frequency that is in line with the needs of the cloud platform? Also, is the cloud platform able to install agents within the service provider environment if deemed necessary? When intermediating services, you should take a look at what is feasible. This may become a criterion in your decision process.
- In case of aggregation, all intermediation comments are valid. But beyond those, it is now key to ensure the appropriate data is provided so the cloud platform can compile the service monitoring information of the end-to-end service using comparable data. Analyzing what information is provided by the service provider, what data can be captured through other means and how that compares to the information delivered by the internally sourced service elements is key to ensure appropriate service monitoring.
If a service consists of a number of service elements, service monitoring will compile the service data by aggregating the information provided by each of the service elements. Service monitoring will also receive information on events reported by a supply layer. These events are transmitted to Service Health Management and Service Usage.
Aggregated supply layer performance data and current status may be provided to the Service Broker and serve as input in the choice of the appropriate supply layer in environments containing multiple supply layers.
Another strategy to implement network service probes that can monitor the service from an end to end perspective and by this way compensate third party providers monitoring information missing.
Metering a Service
This topic is probably worth a blog entry in its own right. The level of service metering required depends on the chargeback or billing options chosen. The information is generated in other functions, but may have to be integrated by service management. Let’s first look at popular charging mechanisms:
- If the user is charged a subscription, the only information that needs to be provided is the moment the service is provisioned and the moment it is de-provisioned. That information is directly generated by the service instance lifecycle management function when running the provisioning and de-provisioning workflows.
- If the user is charged using a pay-per-reservation approach, in other words, if he is charged from the moment he provisions a service until the service is released, the same applies as in the previous bullet. Alternative schemes, where payment stops when the service is shut down and the counter starts running again as soon as it is restarted, use a similar approach as in each of those cases automation workflows are involved
- If the user is charged a true pay-per-use model, then the actual consumption needs to be provided by either the appropriate supply layer or the external supplier. That information can include the number of CPU minutes consumed, disk space used, network traffic generated, number of executed transactions etc.
Depending on the billing/chargeback schema used for the service, that information should be consolidated and an appropriate CDR (Charge Data Record) created. This is done by the Service Usage function. When looking at what approaches can be taken it is important to look at software licensing as that can be a barrier to many billing/chargeback models. I’ll address this in a separate blog entry.
Service Health Management
Service management receives events. These events are recorded and then processed by service health management:
- If a resource failure is received, service health management will trigger service instance lifecycle management to take action if required. For example, if a server fails, the virtual machines running on the server are no longer operational. This triggers events that affect the status of the services the VMs are part of. Service monitoring may trigger service instance lifecycle management to reload the latest back-up of those VMs on a different resource to keep the service operational.
- But it’s often by correlating multiple events that the true nature of a problem appears, and an incident may be identified. In many situations that incident will require further analysis and potentially human interaction. Once service health management has identified an incident, it may iopen a ticket with the service desk. If the incident is associated with security, the enterprise security management function may have to be involved. Both these functions are part of the Governance, Business & Operations Management and Security functions I described in the architecture overview.
Service Health Management maintains a view of the status of the service at any given moment in time. Quite advanced tools exist today.
Service Usage and Quality of Service
The data collected and compiled by service management is used by both service usage and quality of service. As already mentioned service usage will generate the CDRs (charge data records) and pass those on to the billing and rating function in the demand layer. Service Usage also logs the service activity for security and audit purposes. Fraud and risk information may also be detected.
Service Usage also correlates events for security or audit/compliance reasons. Here again, actions may be triggered through service instance lifecycle management or incidents may be reported through the service desk. Usage information may be stored for further detailed analysis by demand management, CRM or other analysis tools.
Quality of service processes services and resource health information and performance/security data for continuous assessment of service levels. These assessments are passed to quality of experience in the demand layer to compute customer experience and SLA compliance. QoS consolidates service level information around three key topics, confidentiality, integrity and availability.
This entry completes the description of the delivery layer and how services are constructed, used and monitored. My next blog entry will discuss how we actually design a service and how we link to external service providers. So, stay tuned and tell me what you think of the architecture so far.