Around the Storage Block Blog
Find out about all things data storage from Around the Storage Block at HP Communities.

Understanding FAS ESRP Results

By Karl Dohm, Storage Architect


Welcome back to the next in a series of posts where we take a closer look at NetApp and its FAS series of storage arrays.   The discussion topic today is Microsoft's Exchange Solution Reviewed Program (ESRP) and its tie to FAS throughput.


The FAS has some controversial history with regards to performance.  From time to time the issue comes up and in response NetApp has generally denied the problems exist.   Often we find the opposite stance in posts from NetApp lauding their performance, for example in Kostadis Roussos' post where he refers to WAFL write performance as 'surreal'.   But, as I have said in previous posts, there are some justifiable reasons this controversial subject keeps surfacing. 


First of all, let's touch on why an average storage consumer should care about array throughput.  An array with better throughput, i.e. the ability to service more I/Os from a given set of spindles, can result in requiring less hardware to do the same job.  The bigger the throughput difference, the more to be saved on purchase price, warranty cost, power consumption, floor space, cooling, etc.   Array throughput statistics can be meaningful when evaluating value in a storage array.   It seems NetApp also finds this array attribute important given the amount of blog posts and papers they have on the topic of performance.


Recently in the comments section of a blog post on understanding WAFL, NetApp's John Martin and I had a small debate as to whether a synthetic load generator like IOMeter could be used to characterize how an array will perform in production scenarios.  I made the argument that this type of tool can be used to circle the wagons around the I/O characteristics of a real world application, and that through multiple point tests of the load components of the application one could get a reasonable assessment of how well the box will behave.  John's opinion was more along the line that synthetic workload tests were not suitable to provide an indication of how well an array would run with a production application ("Synthetic workloads in isolation lead to non typical results").  He referenced Jetstress as a more accurate indicator.


I took his queue and had a look at the FAS2050 ESRP results paper which describes MS Exchange like throughput of the FAS2050 array.  Even though ESRP isn't intended to be a benchmark, a scan of ESRP results tells me that many vendors seem to use the forum of ESRP as a way to post throughput results relative to how their array handles MS Exchange load.  It kind of makes sense since there seems to be no Exchange related benchmark out there, and ESRP is the closest controlled thing the industry has to work with.  


The NetApp ESRP paper provides insight into how NetApp would recommend setting up the 2050 for Exchange loads, and it shows throughput results in a heavily loaded 10,000 mailbox Jetstress test.  This paper sparked our interest because the described results seemed good and did not correlate with results from synthetic load generators that produce a similar pattern as Jetstress.  Maybe John was right.  We decided to peel back the onion a bit and take a look under the covers of this ESRP test to figure out what was going on. 


We happened to have access to a FAS2050 and decided to try and recreate the ESRP results as published.  It turns out that the IOPs value that NetApp published was in fact roughly re-creatable given the data in the paper.  On the surface this can be viewed as NetApp having made an honest submission to ESRP, and within the letter of the law one could reasonably argue that they did.  But we also learned that NetApp found a way to make their results come across as favorably as possible, meaning the results have little relevance as to how well the FAS will run MS Exchange.  


After a rather lengthy setup experience, we finally configured the aggregates, volumes, servers, LUNs, MPIO, and HBA attributes as described in the ESRP paper.   We even set the diagnostic switch "wafl_downgrade_target" to a value of 0 in accordance with the recommendations in the paper.  


One might ask, as we did, what does "wafl_downgrade_target" do?   In its TR-3647 paper, NetApp describes the switch as follows: The "downgrade_target" command changes the priority of a process within Data ONTAP that handles incoming SCSI requests. This process is used by both FC SAN and iSCSI. If your system is not also running NAS workloads, then this priority shift improves response time." 


I think his description is telling us that the NAS process consumes bandwidth when there is no NAS work to do.  Also, given the NetApp messaging around unified storage architecture, a recommendation to use this switch seems like a bit of a contradiction.  Would you consider it normal to be asked to set a switch that generates the following response?  "Warning: These diagnostic commands are for use by NetWork Appliance personnel only".  Last but not least, this switch resets itself if the array reboots.  I'll leave it to the audience to draw their own conclusions as to whether use of this switch is truly a recommended practice in customer environments.  


Once the array was freshly initialized and everything was set up, we ran the test and observed the results of roughly 2200 average disk database disk transfers/second per host.  Within noise levels, this recreated the results as posted in their ESRP paper


The main problem we have with how NetApp did this testing is that after the initial run, every time this test is run it runs slower than the previous time it ran.  The 2nd run showed results of approximately 1980 transfers/second per server, about an 11% drop.  By the fifth run throughput had dropped to approximately 1555 transfers/second per server - a 30% drop.  After a couple more runs we were down to 1450, 34% slower than the first run. 


I didn't have the patience to run enough times to figure out where this decay curve flattens out. 


At this point I decided to run a "reallocate measure" against one of the database LUNs, and the FAS reported the value to be 17.  According to the NetApp man page for Reallocate Measure: "The threshold when a LUN, file, or volume is considered unoptimized enough that a reallocation should be performed is given as a number from 3 (moderately optimized) to 10 (very unoptimized)".  Allow me to translate - the database LUNs are very fragmented.  For those who might be confused by the use of the word fragmentation in this context, this is not NTFS fragmentation - its WAFL fragmentation.  


Now things were starting to make sense.   We were seeing the same sort of decay curve as shown in the IOMeter results posted in Making Sense of WAFL - Part 4.    Every time the test is run, the random component of the Jetstress database accesses fragment the LUN further and the throughput numbers get worse.  An array like EMC CX or HP EVA wont undergo this sort of decay curve since these arrays do not have internal WAFL-fragmentation problems like the FAS does.


That's not all.  After the throughput test, Jetstress executes a checksum test of the databases to be sure the array did not corrupt any data.  After a few runs I noticed an interesting pattern.  On the FAS, the length of time needed for the checksum calculation also degraded as the database LUNs went through their WAFL-fragmentation.  When the LUNs were fresh and defragmented, the checksum calculation took about 2 hours.  By the fifth run, when the database LUNs had a WAFL-fragmentation measure of 17, the checksum calculation took over 10 hours - a 250% slowdown   To summarize we saw 34% slowdown on database throughput and a 250% slowdown on checksum calculation by just letting the ESRP test run for about 48 hours before taking measurements.


So, drawing this to a close, I think there is a reasonable argument that NetApp should have results more like 1450 (or less) disk transfers/second/host as opposed to the 2220 transfers/second/host they did post.   Most would expect that results in a test as visible as ESRP are measured after a reasonable burn in period.  After all, when someone runs MS Exchange, they usually run it for longer than 2 hours.


Tweet this! 

Labels: NetApp| storage| WAFL
Comments
Anonymous(anon) | ‎09-26-2009 10:17 PM

Hi Calvin

NetApp's degradation under heavy sustained write loads is legendary within the industry, but somehow they manage to market around it.

Check out this REALLY old post I did on the subject way back when.  I'm sure they've attempted to improve the data I showed here, but there's no way they could have mitigated this effect with subsequent releases.

chucksblog.typepad.com/.../benchmarketing_.html

Most customers don't take the time to (a) find a real-world workload, or (b) let it run long enough for WAFL to fragment itself into a coma.

This is one of the many areas we can agree on!

-- Chuk

Anonymous(anon) | ‎09-27-2009 07:02 AM

Ahh yes, benchmarks, my favourite subjects. I like the word legendary, because like most legends they take a small amount of truth, and a good dose of hyperbole to create an interesting story that has little if any relationship to reality. I mean no disrespect to Mr Hollis, but there's just a little too much spin and not nearly enough desire for any real understanding in his posts for my personal taste.

Karl, I'd love to go point by point on where I believe you've made mistakes in your benchmarking methodology, but its Sunday Morning, my son is being extraordinarily cute, and I've got a busy work schedule over the next week.

Just one quick question though, have you upgraded to OnTAP 7.3 yet ? There were lots of performance enhancements in that release that make some big differences in highly stressed environments.

Regards

John

Anonymous(anon) | ‎09-27-2009 10:09 AM

Hi Karl,

I have to I read your post concering your issues with jetstress, and would like to point out a couple of thing you can do to make your tests more consistent.  

First, I noticed that you receive different achieved IOPS levels whe you run various 2 hour tests.This indicates that you've left autotune enabled.  You should run autotune once to generate the appropriate load paramaters, then supress the tuning option and manually enter those parameters on successive tests.  see technet.microsoft.com/.../bb643106.aspx .  This will ensure that each test is using the same number of theads and actually generating the same amount of IO.  Then, to check for performance degredation, look at the physical disk latency counters in the perfmon log generated by jetstess on each run.  The way you are running the test now, the load is appariently different for every run, hence the different achieved IO numbers.

Second, I assume you are using an HP host to run jestress on.   Many servers shipped by HP an other vendors over the last couple of years have an issue with timer drift.  Microsoft's Mike Lagase details the issue here:  blogs.technet.com/.../certain-amd-processors-might-cause-inaccurate-counter-data.aspx .  The two screenshots in Mike's post were created on an HP Proliiant server in August of 2008.  Note the slowly rising resoponse times over a day or so.  In the second screenshot, you'll not that this impacts all drives including the C: drive.  If you reboot, the "latency" disappears but begins slowly creeping up again over time.  The core issue here is a processor defect on certain AMD CPUs.  The solution is to put the /usepmtimer switch in the boot.ini file.  I've seen this so many times, I do it automatically.  It's the quickest, easiest, and most reliable resolution to the problem.  TImer dift compounds the problem with jetstress autotune as you can imagine.  

Hope this helps.

John

Anonymous(anon) | ‎09-28-2009 05:40 AM

blogs.netapp.com/.../random-rocks-and-benchmarks.html takes you to task on your "testing bias".

Anonymous(anon) | ‎09-28-2009 11:15 PM

Hi John,

We followed the FAS ESRP paper's setup instructions, meaning we did not have autotune enabled.  

Timer drift isn't the cause of FAS's throughput dropoff.  The observed curve follows the classic ("legendary") exponential decay which always occurs when running random writes to the FAS.  It would be beyond coincidental for time drift to cause that same observation every time the FAS sees random writes.  Also, when running to other arrays, the observed results are consistent - same IOPs the first run as the last.  Timer drift can't impact one array and not the other.  

Our recreation of NetApp's test ran with Ontap 7.2.6.1, not 7.3, because the now.netapp.com site recommends 7.2.6.1 for those who want a proven stable environment that most of the customer base is running.  By inference, 7.3 is somewhat experimental.  Since most production MS Exchange customers would probably want to be running the proven stable release, that seemed to us like the most sensible choice for ESRP.

Anonymous(anon) | ‎09-29-2009 02:34 AM

Hi Karl,

Very odd indeed.  If you did not use autotune, how did the IOPS differ?  If you run a given number of threads in Jetstress, the IOPS will remain the same with very little variance.  If there is a latency over time issue present, what you will see is the latency numbers rising on each successive run until the test eventually fails.  You didn't mention any of that, you did mention that the IOPS number was dropping.  I'd recommend that you call Microsoft PSS or lverage HP's premier contract and burn a few hours with a Microsoft Exchange specialist to get to the bottom of this.  

www.microsoft.com/.../support.aspx

Anonymous(anon) | ‎09-29-2009 06:22 AM

What do you mean "somewhat experimental"?

DataONTAP 7.3.x is GA for a long time already.

From the Data ONTAP release model on NOW:

When a release reaches GA classification, NetApp is recommending use of the release in a production environment.

I'd suggest running your test with this release - including it's best practice settings - like John recommended.

Anonymous(anon) | ‎09-29-2009 06:30 AM

John,

To your question "How did the IOPs differ?":

The reason the IOPs degrade from run to run is due to the effect of WAFL-fragmentation.  Both reads and writes take more time to satisfy as WAFL increasingly scatters data on the spindles.    

The 2050 passed the ESRP test in all runs tried.  This means average latencies were under 20msec.  As IOPs eroded on successive runs, average read latencies climbed from about 12.8 to 19.4 msec.  Thats still passing, but getting close to not passing.  We dont know if the throughput decay caused by additional runs would have eventually led to read latencies reaching 20msec, at which point the test would have failed.

All the components needed to recreate the FAS2050 ESRP test are in the public domain.  We welcome independent confimation of our observations.  

Anonymous(anon) | ‎09-30-2009 09:01 AM

Hi Karl,

I have to say that I did check out Alex McDonald's post over on the NetApp blogs ( blogs.netapp.com/.../random-rocks-and-benchmarks.html ) and ran that little exercise.  His numbers do add up; yours don't.  You're saying you repeatedly ran jetstress with the same parameters and observed all three of the following:

1. A 30% decrease in IOPS over time.

2. A 50% increase in latency over time.

3. A passing result.

Maybe somebody from Microsoft can chime in, but I don't think that's possible on any storage device.  You're beginning to sound an awful lot like this guy:  www.youtube.com/watch  

John

Anonymous(anon) | ‎10-01-2009 10:24 PM

"The reason the IOPs degrade from run to run is due to the effect of WAFL-fragmentation. Both reads and writes take more time to satisfy as WAFL increasingly scatters data on the spindles."

Now you're just guessing. Writes in particular are written to and acknowledged from NVRAM, and purged as a stripe (a sequential stripe write, no less) to disk; latency will be in the order of a millisecond or so. Doesn't matter if they're originally sequential writes or random writes; WAFL stripe writes to new contiguous areas on the disk, eliminating the difference.

For a random read, as I and others have repeatedly pointed out, no matter how you lay the data out on disk (sequentially, randomly, upside down, inside out or back to front), the latency on average will be the same. That's the point of random read IO; there's no way you can anticpate where the next block will come from, and there's no magic way of laying out the data to eliminate the randomness.

Serioulsy, there's something wrong with your testing methodology.

Anonymous(anon) | ‎10-02-2009 01:05 PM

John, it would be beneficial to elaborate on the details of what part, specifically, you think doesn’t add up.   So far I think you’ve gotten credible answers to every potential hole you came up with.


WAFL undergoes internal fragmentation which negatively impacts throughput.  Since the “WA” of WAFL stands for “Write Anywhere”, it shouldn’t be surprising to anyone that WAFL is scattering data on spindles.  Common sense tells us that scattered 4KB blocks result in more seeks and rotational latencies on reads, especially large block or sequential reads.   This isn’t rocket science.  


I’m a bit confused with the bipolar nature of NetApp’s stance on WAFL – on one hand they seem quite proud of WAFL’s ability to write anywhere, yet when confronted with the downside of that approach all the sudden WAFL isn’t writing anywhere anymore.


Patrick Cimprich, a Chief Architect at Avande (a NetApp business partner), once said the following in a reply to one of my posts


Looking at your original post you comment on performance degradation of NetApp systems. In simple terms - yes - this is accurate. NetApp volumes can appear exceedingly fast - to the point of defying physics sometimes - when they are brand-spanking new. Its also true that the performance of those volumes does indeed degrade over time. However, that performance does not trend down indefinitely. It will flatten out and establish a very consistent level of performance.”


That’s pretty much what I’m saying too.  Perhaps the exceedingly fast part is true for some brief window of time until the fragmentation takes hold.  I think the storage community expects that NetApp post the “consistent level of performance” results  in its ESRP writeup, not the “exceedingly fast” values that occur when the array is “brand-spanking new”.


Even though Patrick is a big supporter of the FAS, I give him credit for admitting that NetApp has a fragmentation problem.   He took the honorable approach of acknowledging the deficiency but then moving on and providing an argument as to why it doesn’t matter.   It’s not that I agree with his end conclusion, but at least I can accept and work with his approach.  


Karl

Anonymous(anon) | ‎10-02-2009 10:52 PM

(Repost with the spell checked version ... d'oh !)


Karl,


I've finally found some time to try yet again to help you come up with a fair way of evaluating and reporting the performance of our arrays.


To keep this relevant I'll address your issues in reverse order


KD - I think the storage community expects that NetApp post the “consistent level of performance” results  in its ESRP write-up, not the “exceedingly fast” values that occur when the array is “brand-spanking new”.


JM - We do, every benchmark result we post happens after a long burn in time to ensure the results are completely consistent. If you take a look at our extended run for the SPC-1 figures this has been shown graphically. MSRP also involves a 12 hour stress test, and again all of this is done at "steady state"


As far as I can tell, your apparent inability to achieve decent steady state performance is more of a reflection of your lack of experience with our technology, or that you are using outdated technology, or that you are cherry picking your results. Maybe it’s a combination of all of these


KD - WAFL undergoes internal fragmentation which negatively impacts throughput


JM - Compared to what exactly ? If your answer is to a NetApp array with pristine aggregates then, depending on the workload, you might be correct, if on the other hand you are comparing the steady state performance of a "fragmented" WAFL aggregate to an equivalent array from HP or say EMC for the same number of spindles and a comparable workload and controller then you'd be incorrect.


KD -  Since the “WA” of WAFL stands for “Write Anywhere”, it shouldn’t be surprising to anyone that WAFL is scattering data on spindles


JM - Firstly you should recall that the acronym of WAFL was originally a pun, our founders wanted to create a storage "appliance" that was as easy to use as a typical consumer "appliance" like a toaster, our original arrays were nicknamed "toasters", and a "WAFL (waffle) is something you put into a "toaster", As a pun it’s pretty clever, but it’s hardly a highly accurate description of how the technology works.


JM - Secondly, just because we can write the data to any location on disk, doesn’t mean that we write to just "any" location at random as inferred by your blog, we can, and do, choose very carefully where that data is laid out. Part of that careful layout ensures that we do the minimum number of seeks when writing data and explains why, in part, we are able to get such good write results from dual parity raid configurations. It’s true that over time as the aggregate fills up and ages, we get fewer good choices as to where we do our writes, but having said that, even when systems are 80 - 90% full and have been in production for many years, our write efficiency often exceeds that of RAID-10 (which is at best only 50% efficient)


KD - Common sense tells us that scattered 4KB blocks result in more seeks and rotational latencies on reads


JM - Common sense tells us that by "scattering" 4K blocks for a workload across as many spindles as possible means that we can perform many operations in parallel which decreases latency for random reads and increases overall throughput for random read operations. For the record, your use of the word "scattering" which infers that the placement is random is both inaccurate and misleading


KD - especially large block or sequential reads.


JM - Well, here's something we agree (within limits), WAFL is optimized for random writes and random reads, workloads typified by databases, virtualized server and desktop environments, mail servers, SharePoint, home directories, ERP applications. Interestingly enough, the Jetstress workload you were running AFAIK, didn’t have any large block sequential reads as a part of its workloads. Having said that, there is a lot of other things we do with optimizing read ahead for large sequential workloads, but that a topic for a longer post, possibly Kostadis has already detailed a lot of it there. If you're really interested in understanding how our technology works you would do well to read and truly seek to understand what he's written. His blog can be found here blogs.netapp.com/extensible_netapp


KD - John, it would be beneficial to elaborate on the details of what part, specifically, you think doesn’t add up.   So far I think you’ve gotten credible answers to every potential hole you came up with.


JM - I'm a different John, nonetheless there are some things that don’t quite seem right to me either, so if you're genuinely interested in finding out what the maximum steady state performance is for your FAS array, you'll need to provide me with a few things.


1. Ideally you should turn AutoSupport on so diagnostic information can be automatically uploaded to our central repository, this makes it much easier to do diagnosis of any performance or any other issues. This is the first step in resolving any support issue, including performance related cases


2. Once you've done so, please let me know the serial number of your machine so I can easily identify your machine in the AutoSupport database. Even if (for whatever odd reason) decide not to turn on AutoSupport, let me know the serial number in any case as that will make it easier for me to ensure that you have access to all necessary patches updates etc. If you can’t or won’t supply the serial number, please let me know why so I can help get that issue resolved.


3. If you're going to compare your results to ours, you should set up equivalent configurations, this means you really will need to update your software version to 7.3 which is the same version of OnTap used in our MSRP and other results. In the configuration you're using (a 2050 running exchange via Fibre Channel) there really is no good reason not to be running the latest operating system release. It’s the first recommendation I would make to any customer who wanted to get the best performance and reliability from this configuration. Again, if you can’t, or wont upgrade to this release, please let me know why so I can help get that issue resolved.


4. You say in your initial post that "At this point I decided to run a "reallocate measure" against one of the database LUNs, and the FAS reported the value to be 17." This number seems a little on the high  (almost impossibly high), can you confirm your result and the method you used to obtain it ? (feel free to cut and paste the section out of the system audit logs, it should still be there)


Now I don’t actually expect that you'll give me any of this information, as, unlike a real customer with a real problem, you have little or no incentive to really get the most out of your array, in fact your incentive is exactly the opposite, nonetheless I'll do what I can to help, all I ask is that you give me the information and assistance I need to help you work out your problems.


There's other stuff you've written that needs to be addressed, but it's late, and She Who Must Be Obeyed, has commanded me to the boudoir


Regards


John Martin


Consulting Systems Engineer - ANZ

Anonymous(anon) | ‎10-03-2009 05:58 AM

Hi Karl,

No, you haven't addressed the issues at all.  Instead you redirect to some weird witch hunt about a particular storage platform.  My issues specifically are:

1.  Let's take Jetstress.  It the tool that's used to validate that storage will perform at certain latency at a specific IOPS level.  In your case, you have IOPS numbers that vary 34% and latency numbers that vary 50%, yet you're telling me all tests passed.  If this is the case, there is something very wrong with the way you have Jetstess configured.  You are in fact stating that Jetstress is worthless as an IO validation tool. That's not the experience of the entire Exchange community for over a decade.  I propose that there's something wrong with your Jetstress configuration.

2.  You were originally only quoting IOPS numbers.  When I told you that you needed to look at latency, you presented two numbers claiming a 34% increase in latency, I guess to match the 34% drop in IO.  Yes, 12.8 is 34% less than 19.2.  The issue here is that 19.2 is 50% more, not 34% more, than 12.8.  12.8 is your first result and 19.2 is your last.  I don't know; maybe you had a calculator malfunction while performing this "test", but the result you present doesn’t look good with the obvious math error and all.

3.  You purport to test a thing.  The thing is a specific vendor's storage device in this case.  Of course it's no coincidence that the vendor is your competitor, but that's beside the point.  How is one to believe that you are performing accurate tests on the storage device in question when points 1 and 2 indicate a test methodology that is deeply flawed?  Fix you test, then maybe you could have a valid discussion about what it is you are testing.

John

Anonymous(anon) | ‎10-03-2009 08:53 AM

A couple of minor corrections to my last post.

1. I meant to say ESRP not MSRP

2. I forgot that you ran the Checksum phase during the jetstress tests which is indeed large block sequential

My offer still stands to help you, but in the interest of two way communication, can you reduce the time it takes to approve comment responses ?

| ‎10-03-2009 10:02 AM

Hi John,

As the editor in chief of the blog, I'm the only person that can approve comments.  I get to them as fast as I can and I'm sorry that "fast as I can" sometimes takes a bit of time.  Keep the comments coming and I'll approve them as fast as I can.  I'll talk to the blog support team and see if perhaps people that are signed in have their comments "auto-approved".

Thanks - Calvin

Anonymous(anon) | ‎10-06-2009 06:52 AM

Lets step back.  

The test was chosen, written up, submitted to ESRP, and published by NetApp.  The ESRP paper states that the firmware level used was 7.2, not 7.3, and does not indicate a burn in period.  The test setup was as described in detail by NetApp.  

I’m not planning to rerun the test with improved firmware, altered JetStress parameters, more diagnostic switches, or whatever other massaging I’m being asked to do.  I’m sure NetApp can squeeze the most out of this array without my help.   Calls for me to change the test, or get help, or find the errors of my ways, are fairly certain to be diversionary tactics.

I think the key takeaway is that this outcome calls into question the integrity of any FAS ESRP submission on the grounds of similar irregularities – most notably massively declining throughput as the test runs longer and the use of diagnostic switches to enhance results.  Based on what was uncovered by looking briefly under the covers of this one FAS2050 submission, why believe different?

NetApp team, why don’t you formally resubmit the FAS2050 ESRP to MS with the simple clarification declaring the amount of burn in period you did run.   That alone would be a big step forward in defending your position.

Anonymous(anon) | ‎10-06-2009 03:49 PM

Calvin,

     Thanks,  I'm kind of used to the "post first, ask questions later" approach on most of the NetApp Blogs. I'm sorry if I sounded a little snippy. I had a perfectly polite post to one of your blogs on SSD's where I was agreeing with HP, that seemed to get mysteriously edited out, probably because you've got a current "Take out NetApp" campaign going on now, so I'm a little wary of HP's practices in this regard. Having said that, as much as we all enjoy this kind of public debate, I expect that you also have a "day job" that consists of trying to help customers, and can only spare so much time to blogging before you have to do something more productive.

To be honest, I'm genuinely curious as to why Karl's results are so bad, I run up similar kinds of tests in my lab infrastructure and I get very different results, same goes for customer sites.

I doubt that Karl is deliberately knobbling the results, although the approach is getting a little tired. I’ve seen it before, and it proves nothing that we haven’t addressed numerous times. If you benchmark an array with large amounts of memory (XP12000, DMX, FAS with PAM II), the performance starts slow and improves as the cache warms. What does this prove ? Nothing other than benchmark results are only meaningful if you do them after you’ve achieved steady state

Running the "watch as the performance degrades over time test"  by graphing the results from a virgin system and then stopping before it hits steady state with a  - what do you think happens from here ? -, and - NetApp are fudging their results by testing on virgin systems ! - (I'm paraphrasing, but I don’t think I'm misrepresenting his position) are all things EMC did three years ago and frankly they did it much better. Now they did do some pretty dodgy things (like using  number of  small aggregates and putting one LUN on each) for some of their tests. Karl on the other hand is at least purporting to follow our best practice which is why I'm curious about his sub-par steady state performance.

I wonder if there's something wrong with the unit, or if he's set up the LUN's using SnapDrive and then stuffed up the alignment by subsequent use of diskpar/diskpart. All of this stuff would be much easier to figure out if we got an AutoSupport. Its really not hard to configure, and based on some of the information Karl puts up on his blog, he, or someone he knows, appears to have access to the NetApp customer portal so he can use the visualisation capabilities to perform a lot of his own troubleshooting.

On a final note, the "setflag wafl_downgrade_target 0" would only every be set on an array on which you never intend to run a NAS workload (like say for example a dedicated piece of storage for Exchange). As a result we recommend that you only turn this on under advice from NetApp technical support because changing setflags is a little like registry entries on Windows, - you only do this in exceptional circumstances. If you wanted to make this permanent across controller reboots, you also make the change in a startup control file.  Doing this for our customers is unusual, but it’s not unheard of either.

As more customers see the value in NetApp FAS for pure SAN workloads, we've changed that setting in OnTAP 7.3 from a setflag to an option which means  you can turn it on and off without the warnings and its automatically persistent.

Karl is right though, our positioning on Unified storage says that single purpose devices are probably not the best way to go (except perhaps in single purpose benchmarks) as they tend to be wasteful of resources.

If I was like most IT shops I’ve talked to who had a 10,000 user exchange workload, then I’d also have a 10,000 users CIFS workload, and a probably  SQL server and Oracle  workload, and a VMware on NFS workload. To drive efficiencies into my datacenter I'd probably buy a larger FAS model , share my resources, and leave that option unchecked.

On the other hand, if you just want to run SAN and you like what we do with dual disk data protection, and thin provisioning, and great asynchronous mirroring, and killer snapshots, and built in offhost backup and usable backups, but never want to run a NAS workload on our kit, then feel free, change the setting and enjoy the most functional SAN product on the market :-)

Regards

John Martin

Consulting Systems Engineer - ANZ

| ‎10-07-2009 12:30 AM

Hey John,

I can tell you I've never edited any comment on this blog - ever.  I have rejected a small handful of spam comments (people who obviously care nothing about storage but leave a comment pointing back to some suspicious URL).  I'm not sure what happened to your SSD comment but if the comment wasn't posted, I promise you, it wasn't intentional.

Thanks - Calvin

Anonymous(anon) | ‎10-07-2009 01:38 AM

Hi Karl,

The fact is the ESRP results specifically state there were two FAS2050 controller and two hosts.  Your “reproduction” is only using one FAS2050 as far as I can tell.  (Primary Storage Hardware Table on Page 10  media.netapp.com/.../fas2050-10000-user-fibre-solution.pdf) When using two hosts, Microsoft specifically points out the issue with Jetstress auto-tune in their documentation on Technet.  You ignore this as well.  Then, in your response time results, there’s a blatant math error.

Karl, you have a theory which you are trying to prove. Much like the phrenologists of the 19th century, in your zeal to “prove” your theory you attempt to force the data to fit.  The fact is, your testing methodology is invalid.  Your results are easily explained in the Jetstress documentation on Technet.   Your test design uses half the storage controllers that were used in the ESRP.  Your post leaves a lot of unanswered questions about your configuration and testing methodology, and I understand that in your position you have no motivation or desire to answer them.

From your post:

1.       I don’t see where you could reproduce the configuration with a single FAS2050 controller as you stated.

2.      The variance in Jetstress results WITHOUT A FAILURE only makes sense if you left Jetstress auto-tune enabled.

3.      19.2 is 50% more than 12.8, not 34% more.

All you’ve accomplished is making a fairly condemning statement about the credibility of HP.

John

Anonymous(anon) | ‎10-07-2009 08:09 AM

John M,

Thanks for the relatively constructive response.  

No doubt NetApp has addressed the fragmentation problem many times in the past.   I’ll also agree that it’s getting old.   But there is a difference between denying the problems exist, and truly addressing them.  This issue has lasting power because it has meaningful repercussions – the increased size of a FAS array needed to do the same job as a competitor, the implications to NetApp’s credibility, and because it taints every other vendor’s desire to post results like ESRP.  It wouldn’t be quite fair for one vendor to post results that are burned in, while another posts results that are not burned in, would it?   Especially when the vendor not burning-in has a 35%+ benefit.  Perhaps arrays like EVA would be better off with a fragmentation problem so we could compete on a level playing field.

There is much more to talk about, so as my day job permits we will move on.   There are the rather amazing slowdowns caused by deduplication (which come on top of the slowdowns caused by fragmentation).    Someday we should talk of the FAS’s capacity efficiency relative to competitors.   I also had to go through a FAS re-initialize recently, and that management experience was interesting enough to be worth describing.  No doubt I'll be accused of dredging up the past again.

I have no incentive to change anything about the FAS2050 ESRP test setup.  My goal is to run it exactly as NetApp did, because that makes the results all the more compelling.   Any test I came up with wouldn’t hold a candle against one that NetApp came up with.

cleanur | ‎10-07-2009 05:38 PM

So was only a single FAS2050 controller used or were a clustered pair used as stated in the ESRP report ?

Anonymous(anon) | ‎10-07-2009 11:40 PM

"There are the rather amazing slowdowns caused by deduplication..."

Tell us about dedupe on the EVA, or LeftHand, or PolyServe, or any other storage platform HP might sell. What %age of space does dedupe on those systems save?

Amaze me.

Anonymous(anon) | ‎10-08-2009 12:38 AM

It was a clustered pair, exactly as NetApp describes it in the paper.  The figures discussed above are per side of the cluster.  NetApp reported about 2220 IOPs per side.  We saw it degrade to 1450 IOPs per side after about 48 hours of running the described Jetstress load.  Jetstress was also set up exactly the same as the NetApp ESRP paper.  It was still degrading so we don’t know where the curve flattens out, but the actual steady state will be worse.   NetApp reports results per side of the cluster in its paper, so that’s the same tact taken here.

2220-1450 is 770, and 770/2220 is 35%.  Perhaps John [] is right though.  I’ve been overly courteous to NetApp in reporting this as the amount of degradation from the max figure of 2220.   Really it would be more useful to be reporting this as the amount of boost being added to the steady state results (i.e. the amount results are inflated).  In that case its 770/1450, or a 53% boost.   NetApp inflated its IOPs results in this ESRP test by at least 53%

Either way, its big.  Big is why this issue won’t go away for NetApp.

| ‎10-08-2009 02:53 AM

Hey Alex - I really did enjoy meeting you at VMworld and having a beer with you at the storage Tweet-up.  It was fun to hang out and get to know you a bit more.


What I don't understand is your insistence at taking these discussions to sarcasm. (Maybe that’s the Scottish in you coming out).  The fact is that deduplication does consume resources - there's no avoiding that.  HP has deduplication in our disk-based backup products (D2D Backup System and Virtual Library System).  Also, we do have deduplication with our Extreme Data Storage System (ExDS9100) through our partnership with Ocarina so please don’t try to play the “where in the world is deduplication at HP” game.  


Where we disagree is the need for tier 1/tier2 block-based deduplication.  You can count on your fingers the number of customers who have asked us for dedup on the XP, EVA and HP LeftHand arrays.  You guys are starting to sound like Chuck pushing SSD’s 18 months ago.  I'm sure you guys can trot out a few customers that will say dedup on their tier 1/2 storage is dandy but our experience shows it's not a pain point customers want to solve by giving up performance.


I can’t say your response surprises me – the MO of NetApp responses has been to dodge answering the questions that have been asked or changing the subject to deflect the real issue.  Karl’s testing has shown a dramatic difference to the NetApp-submitted ESRP testing results and the only viable conclusion is that fragmentation is the cause.  Karl’s gone to extraordinary efforts to show that what he did lines up to what was documented by your team.  


And I have to say, John’s comment from last Friday saying that WAFL really doesn’t stand for “Write Anywhere” is a major joke.  Someone needs to go change the wiki page about WAFL and the fact that Data ONTAP has included a defrag tool since 7G.


So come on – stop with the rabbit trails and red herrings and address why the big difference.  


Calvin

Anonymous(anon) | ‎10-08-2009 03:05 AM

That's odd Karl

In your original post you said "We happened to have access to a FAS2050 and decided to try and recreate"  referring to a singular array, and "Once the array was freshly initialized and everything was set up,"  again referring to a singular array.  Now you say it was a pair.  I hope this serves to illustrate one of the many problems with your methodology and reporting.  Without full disclosure, it's practically impossible to tell what you did.  

John

Anonymous(anon) | ‎10-08-2009 04:26 AM

So are there any published results of an EVA4400 with a comparable configuration and stable firmware (if such thing exists)?

BTW your talk (begin of original blog) about getting more IO's from spindels and being more efficient is rather odd, since NetApp supports Thin Provisioning and deduplication, and will beat the EVA any day in efficiency.

Anonymous(anon) | ‎10-08-2009 05:09 AM

John ,  I should have said FAS2050c.  Most wouldn’t consider running a storage array with a single controller in this price range, so I thought the “c” was rather redundant.  

As far as full disclosure goes, to the best of our ability, this test is being run exactly as described in your ESRP submission paper.  There is nothing left to disclose as the disclosure would essentially be the same words and pictures as in your ESRP paper.  Even the results are the same – until the run time got extended to be more than 2 hours.  Since we recreated the IOPs for the 1st two hour run, I think you can be pretty certain we had things set up as NetApp did.  

Anonymous(anon) | ‎10-08-2009 08:00 AM

@neils

It looks like it'll only beat the EVA on efficiency at the cost of performance and even then only marginally and this is a performance test for Exchange . If you want comparisons take your pick below but remember you can't directly compare ESRP results from differing configurations or products, it's not a benchmark, more of a shop window. You could just go back to Karls posts on making sense of WAFL which appears to have identified the same issue using a different testing regime. Again from a documented test published by a Netapp partner.

technet.microsoft.com/.../bb412164.aspx

http://tinyurl.com/ye86ml8

I think the point is Karls testing has followed the Netapp ESRP doc to the letter and it appears he's not able to repeat the results for anything but the briefest test run. Despite the advice to use the latest ontap level the ESRP doc states the same level Karl's using and it appears he's also using a dual node cluster configured in an identical manner. The jetstress issues appear to be only diversionary since these same server end issues have had no effect on the EVA results.

So if this isn't repeatable  here, is it repeatable out their in the real world for real Customers ?

| ‎10-08-2009 08:33 AM

Cleanur - your comment brilliantly summarizes the situation.  I'm sure we'll get an answer from NetApp saying that there's something wrong with Karl's testing, its not valid because he's biased or some hand-waving dismissal of what he has demonstrated but you are dead on!


I'd love to hear about a customer repeating the test - I know Karl has spent an enormous amount of time doing this so I'm not sure if a customer would try this on their own but I welcome anyone who does to share their results.


Thanks again!

Anonymous(anon) | ‎10-08-2009 07:43 PM

Hi Calvin.

Yes, it was good to meet; it's always good to realise that what you see through the medium of the internet is really is through a glass darkly. Or, as in our case, through several dark beers.

It's not sarcasm. I'm using exactly the same words as Karl used. He's amazed. I'm amazed he's got the chutzpah to be amazed. I'm amazed you think that "our experience shows it's not a pain point customers want to solve by giving up performance". Your customers aren't asking because they don't know the advantages of dedupe, but they do know HP don't do it. That's an HP generated pain.

Here's why NetApp customers use dedupe and value it highly.

1. It saves space. It's scheduled, run off-peak, and deduplicates in the background. When the box is busy, it's not deduping. No impact to performance, and savings can be considerable; 30% on home directories, 50%+ (we've seen up to 90% in some cases) on VMware.

2. It improves performance. If you and I and Karl and all the other readers of your blog all ask for the same block, it's read once, cached, and doled out from memory -- fast. Think boot storms on VDI as an example, and doing all that IO on a Monday morning at 8.00am with latencies of around 1 ms.

It's a serious point. Amaze me. Show me some HP kit that supports dedupe and improves performance.

Anonymous(anon) | ‎10-08-2009 09:17 PM

You say you followed the configuration to the letter and got different results, therefore we are liars.

I say, strange doesnt match my experience, our results were acheived at steady state, there are anomalous results in your findings (e.g. almost impossibly high numbers reported by the reallocate scan) please send me more details via our autosupport mechanism so I can verify your configuration.

I repeated my request.

Still no autosupport

... why ?

Anonymous(anon) | ‎10-09-2009 03:39 AM

Agree with your perspective of having met.


As to your wish that we amaze you - as much as it pains me to say this, amazing you isn't our goal here.  That said, in my previous comment I already mentioned the dedup capabilities that we have today - we don't currently have dedup in our SAN based arrays.  I can't say more than that unless you work for HP.


That said, it's something we will certainly talk more about in the future - but let's get back to the ESRP performance results.  That's the topic at hand.  Karl has done a stellar job at reproducing the test environment and the results don't match up.  We've tried to get to the bottom of similar issues here in the past and all we get is hand-waving, changing the subject, or other red herrings.  This is the second time that we've run a test run by NetApp (or a vendor you paid to run a test for you) and we see degrading results. 

Anonymous(anon) | ‎10-09-2009 04:15 PM

I'd say a less than stellar job if the results don't match up. John Martin seems very willing to help you guys out, so where's that autosupport he keeps asking for? Did you send him that yet?

Now who's hand-waving?

Anonymous(anon) | ‎10-09-2009 06:34 PM

@Calvin

Dedupe is on the HP roadmap? For which platform? The EVA doesn't have even thin provisioning (yet) so it can't be that... Thanks for the heads up.

You might mean degraded results, not degrading results.

You said earlier; "I can’t say your response surprises me – the MO of NetApp responses has been to dodge answering the questions that have been asked or changing the subject to deflect the real issue." That's your modues operandi; we would like to answer the questions, but you pointedly refuse us the very material that would let us answer this. Send us the autosupport as John Martin asks. Do it privately; you have my email address. We won't publish anything, we'll work together and resolve the issue. And if you're right -- you get to keep the trophy.

But perhaps "degrading" was a Freudian slip. You're casting aspersions about our products and our honesty, and you really don't want to get to the bottom of this at all, because it suits your message. I expect better from a professional competitor with a hard-won reputation to uphold.

Anonymous(anon) | ‎10-10-2009 07:17 AM

With respect to autosupport, the autosupport data for the test run by NetApp is also not in the public domain.  It isn’t productive to take a tangent that seeds another round of diversionary posts from NetApp.  Configuration aspects of the system are as described in NetApp’s ESRP paper.   Everything else is set to manufacturing default settings.  

Lets choose to stay on track instead.  Is NetApp willing to answer these questions?  

Given ESRP is supposed to show best practices, why were internal -use-only diagnostic switches used to enhance results?  

Why was the test only run for 2 hours?

Why is burn in time not mentioned in the report?

What was the burn in time?  

Why doesn’t NetApp formally update the ESRP submission so that it documents burn in time?  

Why does NetApp not offer to conduct this test in a public forum for everyone to observe?  

If there is no WAFL-fragmentation, why do Jetstress IOPs slow down by 35%+?  

If there is no WAFL-fragmentation, why does checksum calculation slow down by 250%+?

These are rhetorical questions of course.  

Anonymous(anon) | ‎10-10-2009 08:13 PM

You say you've run the tests according to the ESRP submission we made and came up with different results. I've offered to help you find out why, but the only way I can do this is with additional data. The easiest way of doing this is to send us an autosupport and you'd refused to send one, nor have you given a good reason why not. I'll leave it to the reader to draw their own conclusions.

As to your questions -

KD Given ESRP is supposed to show best practices, why were internal -use-only diagnostic switches used to enhance results?

Please look at the technical report TR-3647 which was published well before the benchmark in question which  I referenced in our last conversation on this topic. In short, if you're only using us for SAN workloads, then dont hold back any resources for CIFS, NFS etc. Seems fair to me and consitutes best practice for the exchange workload in question, including the setflag documented in the ESRP submission.

KD Why was the test only run for 2 hours?

Did you miss the section on the 24 hour stress test ?

KD Why is burn in time not mentioned in the report? (And other questions related to burn in time)

Burn in / Ramp Up time is part of pretty much every benchmark, sometimes its included as part of the submission, sometimes its not, depending on the benchmark requirements. From my recollection ramp up times etc are all documented in the SPC-1 which we've published many times recently. ESRP doesnt include a section on this.

In any case, its only steady state which is important or relevant to a benchmark. Ramp up/warm up times might be interesting in a forum like this, but ultimately irrelvant in sizing or comparing systems for representative workloads which is what benchmarks are intended to be for.

Why does NetApp not offer to conduct this test in a public forum for everyone to observe?

Becuase this is not what the ESRP is about, however we do run a similar kind of workload in SPC-1 which is independently audited and has a formal process which allows others including HP to challenge us on. Check out www.storageperformance.org for more details

If there is no WAFL-fragmentation .... etc

Because there IS a difference in performance between a virgin system and one which has reached steady state.  We’ve never denied this, nor do we make a big deal of it, as we always size our systems for and perform benchmarks at, steady state.

Last and Final offer - are you going to give me the additional data I need to help you find your performance problem or are you going to continue making excuses ?

Anonymous(anon) | ‎10-13-2009 03:48 AM

The fact is we came up with the same results.  For a 2 hour run the results were nearly identical, about 2200 IOPs per server.  The difference is that we repeated the test multiple times after the initial run and watched the IOPs progressively decrease from run to run.  

There is no mention in the paper of having conducted a burn-in prior to any run.  Given the major difference a burn-in makes for the FAS, this omission isn’t likely to be a simple oversight.  The results for the 24 hour run in the NetApp paper are predictably slower as compared to the 2 hour run – proof that the IOPs on the FAS are anything but steady state across runs.

Based on the track record, it’s pretty clear that this offer to help us is just another in a long line of diversionary tactics.  Here is a summary of some of the diversions tried so far

• Insistence that the results didn’t recreate, when they did

• The WA of WAFL does not stand for “Write Anywhere”, when it does

• WAFL doesn’t really scatter sequential LUN blocks, which it does

• The database LUNs could not possibly have gotten to a reallocate measure of 17, which they did

• There is no such thing as WAFL-fragmentation, which there is

• Fragmentation has the same impact to all storage vendors, which isn’t accurate

• We did bad math, which we didn’t

• We ran our FAS2050c with only one filer, which we didn’t

• The results are caused by a Proliant time drift bug, which doesn’t make sense

• We should look at SPC, which actually might not be a bad idea .  Based on what we found with ESRP, I have to wonder what would be uncovered there.

I think I’m justified in declining NetApp’s “assistance” in putting a magnifying glass on our system.  

Isn’t there a rather simple way that NetApp can clear things up?   Let’s see how open NetApp is to formally resubmitting ESRP for the FAS2050 showing the burn-in time.    If all the time spent by NetApp responding to this blog were instead used in that regard, this resubmission would already be done and NetApp would have a credible response to this blog.  

Until NetApp takes the step of formally resubmitting ESRP for the FAS2050 with either clarifications on burn in time or a different IOPs figure, I propose we all accept the rather obvious conclusions.  For me at least, it’s time to get back to my day job.

Anonymous(anon) | ‎10-13-2009 07:26 AM

Karl

if that's your final take, I'll make this my last response as you clearly have no intention of working collaboratively or providing enough information for me to let you and the rest of the community know where you went wrong.

1. Insistence that the results didn’t recreate, when they did

No you didnt, if the results were the same, then there would be no challenge or debate, nor would you have published the blog. To be specific, our results stayed consistent over the 24 hour stress test, yours degraded over time. You clearly failed to read the our submission carefully. This is a requirement of an ESRP submission in order to verify that a vendor is sable to sustain thier claimed results. You clearly ignored this result and continue to do so as evidenced by your question above  "Why was the test only run for 2 hours? ". In my mind this raises questions over your due dilligence for the rest of the setup.

The WA of WAFL does not stand for “Write Anywhere”, when it does

OK, read my reply on this again, never did I or anyone from NetApp say that the WA does not stand for "Write Anywhere". Making this assertion is not only innacurate, it makes us out to be liars which frankly is insulting, and you discredit yourself by doing so. What I did say is that WAFL is a more of a pun than an accurate description of the technology. A pun, you know, a clever play on words, Write Anywhere File Layout = WAFL = Yummy thing you put into a Toaster which is an appliance .Ok its about as funny as one of my Dad's jokes, but for IT humor it's about on par with GNU's Not Unix. I wouldnt take either as a basis for a deep understanding of technology.

• WAFL doesn’t really scatter sequential LUN blocks, which it does

My issue is with the word scatter which implies a random distribution of the blocks, which again is not true.

• The database LUNs could not possibly have gotten to a reallocate measure of 17, which they did

The possible ranges for measure layout are as follows.

Reallocate measure_layout         1-10

WAFL scan measure_layout        1-66

In your post, near the beginning, you say “reallocate measure”.  Toward the end, you say “wafl measure layout”.  It’s unclear which command you ran. I asked you for a snip from your log files so I could clarify this, which you failed to do. Your implicatoin is that a result of 17 is very fragmented.  If you used reallocate measure_layout, that result is not possible.  If you used wafl scan measure_layout, 17 is not particularly horrible – it equates to a reallocate measure_layout of 2.5. Having said that 2.5 isnt that bad, its still higher than the 1.8 result we got after after running up a long series of jetstress tests on one of our lab 2050's. Again I suspect misalignment of the LUN's, but there's no point in guessing given your advesarial stance on providing additional information about your environment.

• There is no such thing as WAFL-fragmentation

and

• Fragmentation has the same impact to all storage vendors,

Where did we say those exact things  ? In previous discussions (not this one) I've taken issue with the word fragmentation as being misleading, espcially when comparing it to NTFS compression, but I'm not sure where you're getting the second part. Looking through the responses to in your blog I cant see this anywhere. If you're going to quote someone as saying something or supporting a particular position, you really need to cite them so observers can see exactly what they said and in what context. It would appear that your aim is to paint us as liars and cheats. If I were you, I'd be mindful of the saying "when you point the finger there are three fingers pointing back at you" ...

• We did bad math, which we didn’t

I'd say sloppy math, you did make a basic grade school error when calculating relative percentages as pointed out by John F vis " 19.2 is 50% more than 12.8, not 34% more" . It might not make a material difference, and could be construed as nit-picking, but yet again, your lack of attention to detail isnt exactly confidence inspiring.

• We ran our FAS2050c with only one filer, which we didn’t

You were the one that said you were using a FAS2050, not a FAS2050c, the words "As far as I can tell" when challenging your configuration is indicative of your lack of disclosure and was a reasonable hypothesis given the information you'd provided.

• The results are caused by a Proliant time drift bug, which doesn’t make sense

Your results certainly would be affected by a timer-drift bug, which you discount out of hand, again without providing more detailed information, or allowing an inspection of your environment we can only take educated guesses as to why your results differ from ours

KD - I think I’m justified in declining NetApp’s “assistance” in putting a magnifying glass on our system

Really, no surprises here at all, you go ahead and feel "justified" if that's what gets you though the day. like I said previously "I don’t actually expect that you'll give me any of this information, as, unlike a real customer with a real problem, you have little or no incentive to really get the most out of your array, in fact your incentive is exactly the opposite". This in my opinion, doesnt lend credibility to your results.

As far as resubmitting our ESRP submission, I don't think your "expose" has sufficient merit or credibility to justify going through that exercise again. As far as publishing the "burn in" results, I cant see much point as they're irrelevant to customer deployments, and like you said at the bottom of your post which started this debate "Most would expect that results in a test as visible as ESRP are measured after a reasonable burn in period." which is exactly what we did.

Finally, If you do decide to do an SPC submission, dont forget that unlike this last little exercise of yours, that it includes full disclosure, an independent auditor, and a process to challenge the results. While you're at it why not include an equivalent HP array with an equivalent number of spindles so we can compare results, I'm sure it would be enlightening.

Regards

John Martin

Consulting Systems Engineer - ANZ

The WA of WAFL does not stand for “Write Anywhere”, when it does

OK, read my reply on this again, WAFL is a pun, you know a clever play on words, Write Anywhere File Layout = WAFL = Yummy thing you put into a Toaster which is an appliance .Ok its about as funny as one of my Dad's jokes, but for IT humor it's about on par with GNU's Not Unix. I wouldnt take either as a basis for a deep understanding of the technology.

• WAFL doesn’t really scatter sequential LUN blocks, which it does

My issue is with the word scatter which implies a random distribution of the blocks, which again is not true.

• The database LUNs could not possibly have gotten to a reallocate measure of 17, which they did

The possible ranges for the measure layout are as follows.

Reallocate measure_layout         1-10

WAFL scan measure_layout        1-66

In your post, near the beginning, you say “reallocate measure”.  Toward the end, you say “wafl measure layout”.  It’s unclear which command you ran.  Your implicatoin is that a result of 17 is very fragmented.  If you used reallocate measure_layout, that result is not possible.  If he used wafl scan measure_layout, 17 is not particularly horrible – it equates to a reallocate measure_layout of 2.5.

• There is no such thing as WAFL-fragmentation

and

• Fragmentation has the same impact to all storage vendors,

Where did we say those exact things  ?

• We did bad math, which we didn’t

I'd say sloppy math, you did make a basic grade school error when calculating relative percentages. It doesnt make a material difference, but yet again, your lack of attention to detail isnt confidence inspiring.

• We ran our FAS2050c with only one filer, which we didn’t

You were the one that said you were using a FAS2050, not a FAS2050c - we

, or indeed understand the requirements of the ESRP submission

Anonymous(anon) | ‎10-13-2009 09:24 AM

I think this will help assess the credibility of John M.


Below is an example output of reallocate measure from the FAS2050.  



I randomly chose one of the 30 database LUNs from the ESRP test to run this against.  The range we have seen is 15-17.  The only load this LUN has seen since array initialization is Jetstress – exactly as described in NetApp’s ESRP paper – except I ran it until things were more burned in.


The expected range that NetApp publishes for this reallocate measure is 1 to 10, as described earlier.   The LUN is highly fragmented, and it is caused exclusively by the database load pattern of Jetstress.  The bad news for NetApp is that this means MS Exchange will take everyone’s FAS database LUNs to this level of ultra-fragmentation  (range of 15-17 on a 1-10 scale) or worse.  


John M told us in his last post that this result is “not possible”.    Either he doesn’t know his array very well, or his passion in fanatically defending the FAS has gotten the better of him.  I’d prefer to believe he doesn’t know his array very well – at least he wouldn’t knowingly be pulling our leg that way.

Anonymous(anon) | ‎10-13-2009 10:03 AM

Thanks for finally giving me some of the information I've been asking for.

If you're seeing this, then there appears to be something wrong with your array, possibly a firmware bug, possibly a setflag that's been changed  that shouldnt have been. This is why I asked for the autosupport information as it would allow me to verify the exact version of the software you're using, and all of the various settings.

Once again, if you're interested in portraying a truthful position, send me the autosupport on the array configuration and I'll see what's gone wrong.

Regards

John Martin

John_Fullbright | ‎10-13-2009 10:11 AM

Yes, I have snagit too.  What I don't understand from this long diatribe is:

1.  Why did you pick a non-benchmark that can't be challenged?

2.  Why do you ignore the Microsoft documentation on how to run jetstress?

3.  Why are there obvious math errors in your post?

4.  Why do you change the description of the test mid sequence?

5.  Why don't I see NetApp customers posting confiming your conjecture?

This looks really one sided here.  Pick a benchmark, like SPC, and run the test.  Post the results.  Give NetApp the opportunity to challange that result.   The SPC benchmark includes full disclosure, is independently audited, and has a formal challenge process.   Then maybe your readers could get to the bottom of this.

John F.

Anonymous(anon) | ‎10-14-2009 12:40 AM

Ok, let me try to close this out.  

The door is open for NetApp to prove us wrong and defend its ESRP submission – not by bashing every conceivable aspect of the simple test we ran, but rather by defending properly the formal test it ran.  Why not update the ESRP submission to document burn in time?   How many hours of Jetstress were run prior to the 2 hour test?  How many prior to the 24 hour test?   It couldn’t be simpler for NetApp to prove how wrong we are.  

Until then I think it’s safe to assume that we have successfully rooted out a rat.

Anonymous(anon) | ‎10-14-2009 01:16 PM

Karl,

   We stand by our original submission, the guy who did the submission didn’t even push the equipment that hard, as he wanted to make sure there was plenty of headroom left. Having said that, your findings are at complete variance with the results we pubished for the steady state performance in the 24 hour stress test . This gives us three possibilities

1. Your results are inaccurate or misrepresented

2. Our results are unsustainable (Which I believe is the thrust of your argument)

3. There were differences in the way the tests were performed (Which is my counter-argument)

I believe that you did honestly attempt to reproduce the environment that we used in the ESRP submission, but either because of a lack of understanding or mistake on your part, or a lack of sufficient information included in the ESRP submission on our part, that your attempt failed.

The net result is that two highly similar, yet different configurations came up with very different results. I don’t know why those results were so different, and I'm genuinely curious how you managed to get the figures you did, hence my repeated requests for your AutoSupport data. If this happened in a customer environment, we would dedicate the resources to get it fixed, and the customer would see the kind of sustained performance we quote in the ESRP submission.

Its understandable why you didn’t send me the AutoSupport data or work with me to improve the performance, as you had nothing to gain by doing so. To be fair, it was unreasonable of me to expect otherwise.

As to your final challenge: While I could try finding the time and equipment to recreate the environment and do the test, I'm not sure that anybody really cares about the result of an almost two year old ESRP submission sufficiently for me or anyone else at NetApp to do so. The industry has moved on since then vis FAS2040, Exchange 2010, FCoE, and the trend to virtualising Exchange servers.  In any case there would inevitably be another  “tit for tat” session over the testing methodology and what was and wasn’t disclosed or some other irellevancies.  This situation where we attack each other’s credibility is not worthy of ourselves, or the companies we represent, nor does it really interest end users, it’s just old and stale.

May I suggest another test, something that customers might really get some benefit and insight into HP and NetApp’s respective strengths and weaknesses which would help them make valid design decisions ? For example Exchange 2010, one with a single node FAS2050 using SATA and another with an HP Server running DAS ? I'd be happy to show "burn in time" or whatever else you feel is necessary (despite the fact that I still think its completely irrelevant) if you promise full disclosure of your entire setup including autosupport data.  I can’t promise anything at the moment because there are some internal processes and approvals I’d need to get first, but in principal, what do you think of the idea ?

Regards

John Martin

Consulting Systems Engineer

NetApp - ANZ

Anonymous(anon) | ‎10-17-2009 03:20 AM

I'm wondering if all the NetApp guys take a class on how to conceal their performance degradation problem. I challenge NetApp to give us a real-world read workload test that can run repeatedly for days without exhibiting performance degradation. I realize that this isn’t possible, but if they continue to call everyone’s tests invalid, then have them give us the test script.

We could also have end-users test their read performance over time based on their own application workload, and have them post their results. It would be hard to argue with customer’s data. The only problem here is that most of the systems that have been in the field for any period of time have already significantly degraded, although it would be good for them to check this in any case just for the shock factor.

Of course Alex takes his usual approach of trying to switch the subject, which is a pure admission of guilt in my mind. Yes, NetApp dedupe is cool, but only if you don’t care about performance, ballooning snapshots, and no space savings for block-based (FCP/iSCSI) LUN environments.

NetApp’s in-band de-dupe technology is very simple and easy to implement, and the technology is available in the open source community, which means there are good reasons why the other storage companies haven’t implemented it.  

Why not just use capacity free cloning to prevent duplicate data in the first place? Or run de-dupe as part of your secondary storage process or backup so that you don’t kill application performance.

Sorry for kicking the ant pile on this long blog that has almost gone stale, but I couldn’t resist getting my 2c in.

Anonymous(anon) | ‎02-14-2010 09:38 AM

I still don't get why NO customers are complaining... weird, no?

| ‎02-14-2010 12:56 PM

Dimitris - so you're the CCO - Chief Complaint Officer?  Wow, your team must not be forwarding you those complaints.  I wasn't going to call this out but since you said "NO complaints, there's more than a few complaints on my blog - just poke around and you'll find them.  The most recent once is at the bottom of this post: www.communities.hp.com/.../netapp-usable-capacity-going-going-gone.aspx.


Complaints aside, how many customers are going to measure performance during the first few hours of loading their data and beginning to run an application and then test it again after the burn in?  Well, VMware did it for you in this paper: http://bit.ly/axD9CN see page 8.  As the VMware paper proves and the work that we did to reproduce the ESRP results, once FAS starts to fill up and fragment (which only takes a few hours under a heavy load), the FAS settles into what is probably steady-state that is far less than at start up.   Heck, here's a NetApp paper that shows that too: http://media.netapp.com/documents/tr-3521.pdf (page 6).


As soon as you have a reasoned answer as to why the performance of the FAS6020 throttled back from a high of 9000 IOPS to 3000 IOPS in the VMware paper, I'd be happy to continue the discussion.  Until then, I'm not going to respond to your comments on either Twitter or the blog -  typical NetApp tactic is to resort to attempting to deflect the issue at hand instead of addressing it.

Anonymous(anon) | ‎02-15-2010 11:59 AM

You ever heard of the term "shill"? I saw that other page, with some customers saying they have no problems and some others confirming problems.

Statistically, the sample is weak at best for either case.

Same as with Twitter, several customers replied with "we have no issues after <blah> months/years/whatever".

You make it seem as if there is this deluge of dissatisfied customers that knock on your door crying for help.

I prefer to consider the experiences of NetApp customers with hundreds of thousands of email seats, sometimes over a million, that somehow seem to not have all those issues... even after running the product for years on end. Or that we win against EMC and HP and HDS during Exchange bakeoffs that last for well over 2 hours (several weeks actually).

And - are you implying somehow that NO EVAs have ANY performance issues EVER?

Regarding the declining performance - it's been described here: http://bit.ly/cnO2 in great detail.

The fact that you so vehemently keep posting this is a bit weird, I'm not sure what you're trying to prove.

NetApp engineers size systems for the steady state anyway, so again I don't get the issue. If you ask me to build you a box that sustains 50K exchange users with under 20ms response I'll build you one that keeps doing so consistently after 5 years - is that not good engineering?

And, finally, your benchmark: I'm not gonna say you did it wrong or anything, all I will say is this:

If a customer tells me they want to run this, NetApp will be there to help them do it, they don't need the ESRP document to configure it.

Since the box will probably commonly be used for other stuff too (VMware etc) we will tune holistically and employ methods that may be found in 10 different documents (all freely available). Not difficult, but you obviously need to be trained in the use of the box, which new customers (and HP engineers) are typically not.

D

| ‎02-15-2010 01:46 PM

Wow - where do I begin.  I'm going to keep this short and sweet:

1- You'll have to show me where I "made it seem there's a deludge of dissatisfied customers".  Funny, I've looked all over and can't where I said that.

2 - I also didn't say that no EVA customer has ever had a performance problem.  It was in fact you that said "I don't get why NO customers are complaining" when talking about NetApp.  Your words, not mine.

I'm not sure where you took debating lessons from but you probably should make sure if you're going to attack an opponent's position that they really have taken that position.  It's a waste of time to have to correct your misstatements.

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
25+ years experience around HP Storage. The go-to guy for news and views on all things storage..
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.