- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
Recap and update on NetApp MetroClust er article
By Calvin Zito, aka @HPStorageGuy
A couple of days ago, my colleague Jim Haberkorn wrote an article titled "NetApp MetroCluster: be sure to ask if the car has wheels". The discussion on that article as well as on Twitter has been overwhelming - more than I could keep up with. My sense is that either the NetApp fans (employees, partners, etc.) which I'll refer to as NetAppers, either didn't understand Jim's point or they did and just didn't agree.
Let me start by re-capping his article: MetroCluster is a product that breaks up the no-single-point-of-failure NetApp system on your primary site and turns it into 2 separate systems. Each system, or node, has a single controller because the maximum size of a NetApp cluster is two nodes. In a situation when there is a failover from one node to another, there are multiple single points of failure (SPOFs), most notably the fact that there's only one controller (SPOF) on the remaining node or system. Jim also found a number of other SPOFs not publicly documented by NetApp but by IBM (as an OEM of the NetApp technology). So Jim's discussion boiled down to this - MetroCluster is marketed as a high availability, mission critical solution but once there's a failover, the remaining node has multiple single points of failure.
Initially, many Netappers decried "FUD" - you know, fear, uncertainty and doubt. There's an implicit idea that FUD is also not true. Maybe that's just me but all the FUD yellow cards I saw getting raised on Twitter had me shaking my head. Here's just a few of the tweets I saw exclaiming FUD:
- From @stevie_chambers (from Cisco): @storagebod @HPStorageGuy a magic quadrant for fud <trademarked by our buds http://is.gd/d8D2h" < lol!
- From @MikeNetApp (NetApp employee): Manufacturing this type of FUD only damages your own credibility.
- From @teylemans (another NetApp employee) : @MSR11 @HPStorageGuy Because both sides of the cluster make a cluster (as they are clustered) there are no spofs. Argument out the window!
- From @vStewed (NetApp employee): @HPStorageGuy @chuckhollis - I love the FUD. We have ~400 customers who deployed VMware clusters on MetroCluster... in Germany alone!
There are a lot of others I could show you that stooped to the level of name calling and plain nastiness, but you can search those out yourself if you like. Another debate centered around me saying that any solution that has any SPOFs after a failover isn’t “mission-critical”. Much of what I heard back was that the definition of mission-critical is in the eye of the beholder. Maybe but I tend to disagree.
Today to satisfy my own curiosity, I went to the NetApp web site to see for myself what the description of MetroCluster says. Let me fill you in on what I saw there, specifically the “key points” from this image taken directly from their website:
We could have a lot of fun with these claims but let me just hit a few highlights:
- No system that has a SPOF after it's failed over to a single controller system was designed for "zero unplanned downtime". Sorry, there are way too many things that can fail for this to be designed for no unplanned downtime. No unplanned downtime takes a more rigorous design.
- I'm not sure how the failover is transparent because it's a manual failover. So if one site goes down, the other site doesn't take over until you manually "push the button". Minor point but if you’ve left town for the weekend, this won’t be transparent – it will be apparent to all your users that there’s been a failure.
- Maybe you can claim it helps with planned downtime but the list of things you have to do to synchronize the two sides after one has taken over is long and complicated. See the comment that Jim has from Tuesday (June 29) addressed to John in his article - it points to documentation and the risk of split-brain.
- All of this working is also dependent on the network connection between the two sites. If it goes down and you have some big problems. As Chuck Hollis at EMC stated in a comment (on June 30): Imagine -- a network issue *outside* the data center could compromise availability *inside* the data center. This isn't about fine grades of "better", this is more about "fit for intended purpose".
When a competitor makes claims like this, is it wrong to call them out? Many of you are probably saying yes, that it’s not HP's concern. One NetApp reseller said that I was disrespectful to customers by pointing this out. I guess his assumption is that customers know this already. I guess if NetApp was forthright in describing MetroCluster, I might agree. But they aren’t.
I just had an HP channel partner contact me and give some details about a deal where he was competing with an HP offering (two EVAs in separate locations with local and remote replication) against a NetApp sales rep. The NetApp sales rep couldn't compete with the HP pricing and in an attempt to stay in the deal, he proposed MetroCluster. The prospective customer wasn't told by the NetApp sales rep that after a failover event, he would not be in an HA configuration. The customer didn’t know he’d be at risk in the case of a failover. Let me say that again for all those of you decrying FUD - the potential customer wasn't told that after a failover event he would not be in an HA configuration.
I'm sure the NetAppers will attack what I'm saying here; through all of this back and forth on Jim's post and me discussing it on Twitter, no one from NetApp admitted that after a failover that there is risk of downtime. Alex McDonald, a fine FUD-crier finally came clean when he said "sure, SPOF if failed over. Who said otherwise? Just like 2 disk fails in RAID5". Dang, that could have saved me a lot of time if someone on Twitter would have just confirmed this for me up front, but all I got back was accusations of FUD. Frankly, I’m surprised Alex fessed up and gave an honest answer. Good for him.
Metrocluster seems like a fine solution – but the “key points” on NetApp’s web page has significantly over-promised what it can deliver.
We encourage you to share your comments on this post. Comments are moderated and will be reviewed and posted as promptly as possible during regular business hours.
To ensure your comment is published, please follow our community guidelines.

