- Channel HP
- :
- Enterprise Business Blogs
- :
- Storage
- :
- Around the Storage Block Blog
- :
- Making Sense of WAFL - part 2
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
Making Sense of WAFL - part 2
By Karl Dohm, HP Storage Architect
Today I'm taking a few minutes to respond to some of the comments regarding my initial post on Making Sense of WAFL.
Apparently in that post I unwittingly opened up a few of NetApp's old wounds which have been extensively hashed through previously in public forums. Looking through the responses, NetApp has done a nice job of trying to deflect some of these problems through releases of nice looking apparently credible documentation.
For those that are biased NetApp's way, or are enamored with the technical ways of WAFL, there may be nothing to say to convince you otherwise. But for those with an open mind, read on.
The problems we are talking about here are the core of WAFL, and are clearly not easy to fix - or they would be already fixed. NetApp is not unique is having problems of course, all array vendors have their strong and weak points. But to assert that WAFL has no weaknesses around fragmentation, performance, and capacity utilization defies common sense. The old wounds are there for a reason.
Let's take a look at the Avanade white paper. It glows with enthusiasm about how the FAS3050c performs in MS Exchange based environments. Further detail from the paper's author can be found in an interview here. Peeling back the onion a bit, we see that this paper was created shortly after creation of a business partnership between NetApp and Avanade. Evidence of this partnership can be found here.
The IOmeter baseline performance data cited in the paper is interesting and worth exploring. In the words of the white paper's author the IOmeter test against the FAS3050c had.. "two goals: to validate that our environment was set up correctly and to assess overall performance of the FAS3050c".
The report is exceptionally loose about describing the setup. The transfer size used for IOmeter are claimed to range from .5KB to 64KB in size, but there is no indication on the weight applied to portions of this range. There is no mention of percent reads/writes or percent random vs sequential. It also doesn't discuss MPIO policy or HBA queue depth setting. There is no indication whether OnTap Exchange extents are enabled. Worst of all, and unique to NetApp, it doesn't define the history of writes and therefore level of fragmentation on the LUN.
I like IOMeter because its a relatively simple test to run that is available for anyone to try since its in the public domain. Given this open invitation to compare results with Avanade, it made sense to give the described test a try and see what happens.
It turns out that no matter what combination of the unspecified test parameters I tried, I could never get into the ballpark of results claimed in this white paper.
So to illustrate an example, I decided to just keep things simple as possible. Running a typical exchange 2008 simulation load of 8KB transfer size, 80% random, 60% read, IOmeter queue depth of 128, MPIO round robin, Exchange extents enabled, HBA max queue depth of 254, 20x 15K spindle raid-dp aggregate, and letting the LUN settle through its fragmentation period, the throughput settles at 19MBs at a average latency of 52msec.
The white paper claimed the FAS3050c runs 48 MB/s at 30msec latency, which is a world of difference.
So what gives? One of several things has happened. Perhaps I could not successfully piece together how to run this test from the information given. It would be great to get clarification from NetApp on how to properly run this test and recreate the results. The other explanation is perhaps that the results are not re-creatable without some special internal-use-only tuning parameters. Or perhaps there is no way to recreate these results.
An EVA4400, run with the same workload, experiences approximately 39MB/s at 25 msec average latency. That's about twice the thoughput on a workload that is mostly random, meaning the bottleneck is supposed to be at the spindles. Apparently on the FAS the bottleneck is somewhere else.
Incidentally, this FAS 3050c LUN degraded about 10% in MB/s throughput as the fragmentation settled out. That isn't such a big number, but recall that this test is mostly random I/O. The sequential read portion, if looked at in isolation, degrades much worse. It is why NetApp introduced Exchange extents.
As in my previous post, if you don't believe what I am saying, give it a try. Unlike my colleagues at NetApp, I gave you enough information here to run the test.
Barring sound explanation from NetApp, It seems to me that there is reason to doubt the credibility of the white papers and test results that NetApp is producing.
(Editor's Note: A broken link was fixed - no content changes were made. 14 April 2011)
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
Karl,
There are several issues with this post even if I disregard the innuendo about NetApp's integrity, ethics and business practices. .
What I find frustrating is that your argument can be structured in the following way:
1. I , Karl, the HP storage architect, know WAFL is flawed architecturally
2. My experimental evidence demonstrates this point
I could, just as easily argue:
1. I, Kostadis the NetApp architect, know that WAFL is not flawed technically
2. But I know your experiment is flawed and shows nothing
But the problem is that your an HP employee, and it's important to prove your point, which is that NetApp storage is deeply flawed for exchnage workloads inspite of our commercial succes. And I am a NetApp employee who is not about to describe in detail proprietary information about how exactly WAFL works so as to be successful while delivering unique data management features. And so we are at an impasse.
And so I can never prove my case to your satisfaction, and you can never prove your case to my satisfaction.
And the fact that we are at an impasse is okay. because i never believed benchmarks are how customers should buy storage systems, and I never believed that perfomance should be the only criteria for an exchnage solution. I believe things like the aiblity to create consitent point-in-time copies efficiently with minimal disruption to Exchnage as well efficient storage based replication, the ability to use less storage with no compromise on performance or resiliency (raid-dp) and tools that simplify the exchnage admin's life like SnapManager for Exchnage to be far more valuable than just raw performance.
So this performance argument is an interesting one but at the end of the day a piece of the overall exchange solution puzzle.
But I will make an observation, last I checked you never worked inside of the WAFL code base. Your name was never brought up as a WAFL architect. I do not recall seeing your name on a patent application. You have never actually seen how WAFL is structured. So frankly, how you can presume to make assertions that WAFL is architecturally deeply flawed is a mystery to me.
And I will further observe, that I never argued that Avanade was a benchmark. The whitepaper shows how a significant exchange expert is willing to recommend NetApp storage. That's the significance of that whitepaper.
The benchmark, for what it's worth, was the SPC-1 number. And the benchmark demonstrates our system behavior under random IO workloads that stress the disk subsystem. The result's intended purposes is to deal with commentary such as yours.
And finally, I am, somewhat, surprised that you compaed a 2 year old storage system's performance (FAS3050c) with 2 year old chipsets and significantly smaller amount of memory to HP's brand new storage array.
cheers,
kostadis
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
Hi Karl,
I've responded to some of your comments and questions regarding our NetApp testing. I've posted the response on Avanade's blog site in order to maintain some formatting in tables.
You can find the post here: blog.avanadeadvisor.com/.../12107.aspx
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
I'm a SQL DBA and one of my companies has drank the Kool-aid. Performance is the #1 priority and since our db's were put on this netapp device, we've had nothing but problems. So yes performance is a piece but it is a very huge slice in my world.
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
Hi James,
Thanks for the comment. See the 4th post www.communities.hp.com/.../making-sense-of-wafl-pa
I agree, performance is only one of the important aspects to consider. I plan to get to some of the other main considerations for enterprise class customers (like yourself) in future posts.
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
I was unfortunate enough to purchase one of NetApp's basic iSCSI sans. I wish someone like you was around to tel me that I could get better performance out of openfile, a FOSS distribution. It had for a time biased me against iSCSI in general. I now see that it may havebeen instead a poor choice in product. I am also experiencing what you have described in that the more space is occupied / carved into a LUN the worse it appears to perform.
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
I can tell you as an HP and Netapp customer that I really don't care about high performance. I do care about performance and getting that consistently but I'm just not one part of that 1% that needs crazy speed and feeds, nor will I buy storage based on that... especially for my exchange.
Fast is fast enough and my users don't care if they get data.... this fast or....... this fast.
We also have several SQL installations and have had no problems with performance. Oh did i forget to mention that this is all running on Netapp storage.
Our preferred server vendor is HP and for storage we go Netapp. A correctly architected environment will not present performance problems.
Every server and storage vendor sucks. It just so happens that HP and Netapp suck the least.
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
Thanks for your feedback on what you're looking for and that performance isn't important to you. Different customers have different use cases and it's always interesting to hear about what they are. Personally, I care a lot about Exchange performance - for me, Exchange is very important and I don't want my productivity to go out the window because of slow performance, especially when I can get over 200 messages a day.
I'd tend to disagree with you on your comment about "Every server and storage vendor....". Are there pluses and minuses - absolutely. Are some better than others - yes again. If your premise that they all sucked was true, in our industry, a new company would rise to the top and wipe all of the rest out. Obviously, a lot of storage and server vendors are doing some things right but with lots to still improve upon.
Thanks again...
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
Sorry for resurrecting the dead but couldn't not comment when I stumbled onto this while researching several problems with our deployment.
Caveat: I work with very demanding databases that powers a company's trading system, online retail and ERP apps so my comments are based on those types of workloads.
Perhaps it's my bad luck but the 6 encounters I've had with NetApps (starting with their NAS 3.5 years ago), every single one was negative. In 5 of the 6, the sales engineers made all kinds of interesting claims, at least 2 made claims that defy the laws of physics. Six of six claim WAFL has no overhead and is immune to performance degradation due to fragmentation. They've also claimed the 2GB write cache is more than enough for any use and that the TB read cache is a major performance enhancer.
I used to like NetApp products when we put them in for file & print services and some app server temp stores. I don't know if they work well for small/medium database loads but for heavy workloads, so far, nothing has held up to the claims of their sales teams.
HP, EMC and Hitachi are quite a bit pricier (especially EMC) and take a bit more work to manage but so far, they've taken some serious beating from our databases and are still holding up. In the same period, we had to swap out 3 NetApp filers. That was many late nights and full weekends of work that. Weak sauce.





