OSM is the primary NonStop serviceability application, which detects - in real-time - all hardware, firmware and environmental faults in the NonStop system, and displays them in OSM Service Connection. In addition, OSM creates real-time alerts and sends them to the HP Support Center using HP Insight Remote Support Advanced. OSM also includes an OSM Event Viewer product, which lets the user look at past or real-time NonStop EMS events – either all events or those that satisfy the user-defined criteria (such as subsystem, severity, etc.).
Every NonStop system is shipped with OSM configuration so that it works for most customer environments as is. But there are instances where OSM needs to be customized to your particular environment. Most of this configuration is performed via a file called OSMCONF in the $SYSTEM.ZSERVICE subvolume. In this blog, I will describe some use cases where you can customize OSMCONF file to suit your environment.
RAS (Reliability, Availability and Serviceability) characteristics are what differentiate the mission critical systems from non-mission critical systems. With NonStop systems at the top of mission critical spectrum (L4 level availability), it is only expected of NonStop systems to excel in RAS characteristics. As NonStop Manageability and Serviceability Architect, it falls under my responsibilities to ensure that NonStop systems provide the best serviceability possible. I am happy to brag about some of the serviceability enhancements we have done recently to take the serviceability of the NonStop BladeSystems to the next level. This includes Down-system CLIM firmware update tool, DHCP DNS Configuration Wizard, DHCP and DNS server status monitoring, IPv6 support in OSM, ability for OSM to listen on specific IP addresses, alarm for non-responsive CLIM, log collection from an individual CLIM, CLIM replacement guided procedure, bad blade system board battery alarm, alarm for degraded DIMM in CLIM, suppression of service-induced BladeCluster alarms, ability for OSM to bind to a specific IP address to send indications, and suppression of creation of a specific alarm on a specific resource, to name a few.
How quickly can you service components, systems, or applications to get back up and running—before your competitors gain an edge?