Mission Critical Computing Blog
Your source for the latest insights on HP Integrity, mission critical computing, and other relevant server and technology topics from the BCS team.

Disaster Recovery Architecture with HP Serviceguard solutions

Guest blog written by Bhakthavatsala Naidu, Master ArchitectHP Servers Research and Development

 

Subsequent to my blog posted on Disaster recovery technologies, the next logical step would be to describe the design approaches for Disaster Recovery (DR) architectures.  First of all let us understand the type of disaster recovery architectures that HP Serviceguard for HP-UX/Linux offers.

 

A disaster tolerant or recovery architecture should be able to provide protection against multiple failures at node, network, storage or site level where the IT infrastructure is located. To protect against multiple points of failure, cluster nodes must be separated or kept apart geographically. The nodes can be placed in different rooms, different floors of a building or separate buildings within a campus or separate cities or continents. The geographical separation ensures the DR architecture to be resilient to catastrophic failures such as network failure on a floor where nodes are stationed or power failure at a site level or all together a disaster striking the entire site. Once we achieve the resiliency at compute or node level, the immediate logical step is to safeguard the access to data or data itself from multiple points of failure. There are two approaches available to make the data generated by applications to be made available at two different locations of a disaster tolerant cluster.

1.       Host based data replication

2.       Storage based data replication

 

Let us understand the above technologies a bit more in detail before we dive into HP Serviceguard DR Solutions.

 

Host based replication: A software stack residing at compute node level facilities the data replication to make it more resilient to failures. Mirror Disk (MD) or LVM mirroring, VxVM mirroring or MirrorDisk/UX are some of the examples of software components available on Linux and HP-UX platforms. Major advantage of this technology is that it does not require the same storage array to be placed at both primary and DR sites. They can be of different type of storage arrays from same or different vendors. Major disadvantage of this technology is that it requires additional compute resources at node level to replicate data across the locations. Secondly, the DR solutions shall not have an ability to check the data consistency before activating the applications at DR site in an event of a disaster.

 

Storage based replication: Storage arrays such as HP 3PAR, XP, EVA and EMC support the replication of data at storage system controller level. Broadly there are two modes in which storage systems replicate data between the sites. (1) Synchronous (2) Asynchronous. In synchronous mode, each of the I/O request is sent to both primary and DR sites and successful completion of the I/O request at both the locations determines the outcome of an I/O operation. In asynchronous mode of operation, storage systems typically at a preconfigured time intervals initiate the transmission of accumulated data which is not present at DR site since the last successful data transfer cycle.  Major advantage of this approach is that nodes in the cluster are completely obviated from the data transfer responsibility.

 

HP Serviceguard has three Disaster tolerant/recovery capable products to support the above replication models: (1) Extended distance cluster (2) Metroclusters (3) Continentalclusters

 

Extended Distance clusters:

 

Extended Distance Cluster configurations (also known as Extended Campus Cluster configurations) are specialized cluster configurations, which allow a single HP Serviceguard cluster to extend across two or three separate data centers for increased disaster recovery. These configurations provide additional availability protection against the failure of an entire data center. These configurations allow significantly increased distances between the data centers, so we refer to them as Extended Distance Cluster or Extended HP Serviceguard Cluster configurations.

 

An Extended Distance Cluster is a normal HP Serviceguard cluster that has alternate nodes located in two different data centers separated by distance. Extended distance clusters are connected using a high speed cable that guarantees network access between the nodes as long as all guidelines for disaster recovery architecture are followed. Extended distance clusters were formerly known as campus clusters, but that term is not always appropriate because the supported distance has increased beyond the typical size of a single corporate campus. The maximum distance between nodes in an Extended Distance Cluster is set by the limits of the data replication and networking technologies.

 

Extended Distance cluster relies on host based replication technology to replicate data between the sites. A node in the cluster takes the responsibility of data which is written to a local storage disk also get propagated to the second mirror of the disk present at DR site. One of the major advantages of this technology is that any failure at local storage level requires no failover of an application to DR site as application can continue data access from the DR site. On disadvantages front, application I/O performance may get impacted specially due to increase in separation distance between the sites.

 

Metroclusters:

 

A metropolitan cluster is a cluster that has alternate nodes located in two different parts of a city or in adjacent cities. Putting nodes further apart increases the likelihood that alternate nodes will be available for failover in the event of a disaster. A Metropolitan cluster requires a third location for arbitrator nodes or a quorum server. The distance separating the nodes in a metropolitan cluster is limited by the data replication and network technology available. Each of the sites in a disaster recovery cluster configuration must have the same number of nodes.  

 

In addition, there is no specification on how far the third location has to be from the two main data centers. The third location can be as close as the room next door with its own power source or can be as far as in a site across town. The distance between all three locations dictates the level of disaster tolerance a metropolitan cluster can provide.

 

Metropolitan cluster architecture is implemented through the following products:

  • Metrocluster with 3PAR Remote Copy
  • Metrocluster with Continuous Access for XP
  • Metrocluster with Continuous Access EVA
  • Metrocluster with EMC SRDF

 

Continentalclusters:

 

A continental cluster provides an alternative disaster recovery solution in which distinct clusters can be separated by large distances, with wide area networking used between them. The design is implemented with distinct HP Serviceguard clusters that can be located in different geographic areas with the same or different subnet configuration. In this architecture, each cluster maintains its own quorum, so an arbitrator data center is not used for a continental cluster. A continental cluster can use any WAN connection via a TCP/IP protocol; however, due to data replication needs, high speed connections such as T1 or T3/E3 leased lines or switched lines may be required. Continental cluster automates the recovery process but it uses the manual or push button type approach to initiate the recovery process. Continental clusters support 3PAR, XP, EVA and EMC storage systems for data store and replication across the sites. Continental clusters support 3PAR, XP, EVA and EMC storage systems for data store and replication across the sites.

 

Choice of a cluster: There is no direct formula available which helps identify a clustering solution or combination of above described cluster products to meet the business requirements. However, the following are some of the parameters which can be used to identify the right clustering solution.

(1)    Location of primary and DR site

(2)    Network connectivity and its quality

(3)    Availability of resources and their capacity

(4)    Recovery Point Objective (RPO)

(5)    Recovery Time Objective (RTO)

 

Additional information about HP Serviceguard solutions can be found at the HP Serviceguard Solution Documentation Index Page.

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
HP Servers, Converged Infrastructure, Converged Systems and ExpertOne
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.