Around the Storage Block Blog
Find out about all things data storage from Around the Storage Block at HP Communities.

NetApp buys Engenio - what unified couldn't do diversified will!

There's too much to include in a short summary of this post on the news that NetApp will buy Engenio.  To read the article, click on the article title or the "read more..." link below.  If you're reading this from an RSS reader, please read the full post on my blog.

Labels: NetApp| WAFL

The top Around the Storage blog posts

I have several lists of "best of" Around the Storage blog. I was motivated to do this because of our migration to the new blog platform and some initial challenges we had with URLs to the old posts on the other platform.

Understanding FAS ESRP Results

By Karl Dohm, Storage Architect


Welcome back to the next in a series of posts where we take a closer look at NetApp and its FAS series of storage arrays.   The discussion topic today is Microsoft's Exchange Solution Reviewed Program (ESRP) and its tie to FAS throughput.


The FAS has some controversial history with regards to performance.  From time to time the issue comes up and in response NetApp has generally denied the problems exist.   Often we find the opposite stance in posts from NetApp lauding their performance, for example in Kostadis Roussos' post where he refers to WAFL write performance as 'surreal'.   But, as I have said in previous posts, there are some justifiable reasons this controversial subject keeps surfacing. 


First of all, let's touch on why an average storage consumer should care about array throughput.  An array with better throughput, i.e. the ability to service more I/Os from a given set of spindles, can result in requiring less hardware to do the same job.  The bigger the throughput difference, the more to be saved on purchase price, warranty cost, power consumption, floor space, cooling, etc.   Array throughput statistics can be meaningful when evaluating value in a storage array.   It seems NetApp also finds this array attribute important given the amount of blog posts and papers they have on the topic of performance.


Recently in the comments section of a blog post on understanding WAFL, NetApp's John Martin and I had a small debate as to whether a synthetic load generator like IOMeter could be used to characterize how an array will perform in production scenarios.  I made the argument that this type of tool can be used to circle the wagons around the I/O characteristics of a real world application, and that through multiple point tests of the load components of the application one could get a reasonable assessment of how well the box will behave.  John's opinion was more along the line that synthetic workload tests were not suitable to provide an indication of how well an array would run with a production application ("Synthetic workloads in isolation lead to non typical results").  He referenced Jetstress as a more accurate indicator.


I took his queue and had a look at the FAS2050 ESRP results paper which describes MS Exchange like throughput of the FAS2050 array.  Even though ESRP isn't intended to be a benchmark, a scan of ESRP results tells me that many vendors seem to use the forum of ESRP as a way to post throughput results relative to how their array handles MS Exchange load.  It kind of makes sense since there seems to be no Exchange related benchmark out there, and ESRP is the closest controlled thing the industry has to work with.  


The NetApp ESRP paper provides insight into how NetApp would recommend setting up the 2050 for Exchange loads, and it shows throughput results in a heavily loaded 10,000 mailbox Jetstress test.  This paper sparked our interest because the described results seemed good and did not correlate with results from synthetic load generators that produce a similar pattern as Jetstress.  Maybe John was right.  We decided to peel back the onion a bit and take a look under the covers of this ESRP test to figure out what was going on. 


We happened to have access to a FAS2050 and decided to try and recreate the ESRP results as published.  It turns out that the IOPs value that NetApp published was in fact roughly re-creatable given the data in the paper.  On the surface this can be viewed as NetApp having made an honest submission to ESRP, and within the letter of the law one could reasonably argue that they did.  But we also learned that NetApp found a way to make their results come across as favorably as possible, meaning the results have little relevance as to how well the FAS will run MS Exchange.  


After a rather lengthy setup experience, we finally configured the aggregates, volumes, servers, LUNs, MPIO, and HBA attributes as described in the ESRP paper.   We even set the diagnostic switch "wafl_downgrade_target" to a value of 0 in accordance with the recommendations in the paper.  


One might ask, as we did, what does "wafl_downgrade_target" do?   In its TR-3647 paper, NetApp describes the switch as follows: The "downgrade_target" command changes the priority of a process within Data ONTAP that handles incoming SCSI requests. This process is used by both FC SAN and iSCSI. If your system is not also running NAS workloads, then this priority shift improves response time." 


I think his description is telling us that the NAS process consumes bandwidth when there is no NAS work to do.  Also, given the NetApp messaging around unified storage architecture, a recommendation to use this switch seems like a bit of a contradiction.  Would you consider it normal to be asked to set a switch that generates the following response?  "Warning: These diagnostic commands are for use by NetWork Appliance personnel only".  Last but not least, this switch resets itself if the array reboots.  I'll leave it to the audience to draw their own conclusions as to whether use of this switch is truly a recommended practice in customer environments.  


Once the array was freshly initialized and everything was set up, we ran the test and observed the results of roughly 2200 average disk database disk transfers/second per host.  Within noise levels, this recreated the results as posted in their ESRP paper


The main problem we have with how NetApp did this testing is that after the initial run, every time this test is run it runs slower than the previous time it ran.  The 2nd run showed results of approximately 1980 transfers/second per server, about an 11% drop.  By the fifth run throughput had dropped to approximately 1555 transfers/second per server - a 30% drop.  After a couple more runs we were down to 1450, 34% slower than the first run. 


I didn't have the patience to run enough times to figure out where this decay curve flattens out. 


At this point I decided to run a "reallocate measure" against one of the database LUNs, and the FAS reported the value to be 17.  According to the NetApp man page for Reallocate Measure: "The threshold when a LUN, file, or volume is considered unoptimized enough that a reallocation should be performed is given as a number from 3 (moderately optimized) to 10 (very unoptimized)".  Allow me to translate - the database LUNs are very fragmented.  For those who might be confused by the use of the word fragmentation in this context, this is not NTFS fragmentation - its WAFL fragmentation.  


Now things were starting to make sense.   We were seeing the same sort of decay curve as shown in the IOMeter results posted in Making Sense of WAFL - Part 4.    Every time the test is run, the random component of the Jetstress database accesses fragment the LUN further and the throughput numbers get worse.  An array like EMC CX or HP EVA wont undergo this sort of decay curve since these arrays do not have internal WAFL-fragmentation problems like the FAS does.


That's not all.  After the throughput test, Jetstress executes a checksum test of the databases to be sure the array did not corrupt any data.  After a few runs I noticed an interesting pattern.  On the FAS, the length of time needed for the checksum calculation also degraded as the database LUNs went through their WAFL-fragmentation.  When the LUNs were fresh and defragmented, the checksum calculation took about 2 hours.  By the fifth run, when the database LUNs had a WAFL-fragmentation measure of 17, the checksum calculation took over 10 hours - a 250% slowdown   To summarize we saw 34% slowdown on database throughput and a 250% slowdown on checksum calculation by just letting the ESRP test run for about 48 hours before taking measurements.


So, drawing this to a close, I think there is a reasonable argument that NetApp should have results more like 1450 (or less) disk transfers/second/host as opposed to the 2220 transfers/second/host they did post.   Most would expect that results in a test as visible as ESRP are measured after a reasonable burn in period.  After all, when someone runs MS Exchange, they usually run it for longer than 2 hours.


Tweet this! 

Labels: NetApp| storage| WAFL

Spock says: It is illogical to take offense

By Jim Haberkorn


I had hoped that my last post in regards to NetApp performance claims would end gracefully, as a courageous NetApp employee has apparently now agreed to work with us to find out what we may be doing wrong, if anything, to be getting such poor performance out of our NetApp filer. FYI: That discussion has now moved to engineer Karl Dohm's blog post, where there is now a civil discussion taking place on the subject.


But alas, a graceful ending was not to be. I've been informed that a certain NetApp employee has now moved to Twitter to assert that I have called NetApp a liar in my  blog post.


So, in the interest of setting the record straight on this important point, let me make it clear: I have never referred to NetApp or its bloggers as liars, though I have said, and still believe, that some of their claims and arguments are illogical, both in regards to claims they make about themselves and claims they make about the competition.


If you check my previous blog post, you will see that the word 'liar' was used only once and that was by a NetApp blogger, in a moment of excessive sensitivity. But now another NetApp employee has picked it up and twittered about it. Ah! A new NetApp blogging tactic: One NetApp blogger exaggerates a competitor claim, then the other one attacks the competitor for it. Hmm....I must add that one to my list. 


Now, here are just three examples of NetApp illogic that surfaced in the previous post:



  1. Using blog references to convince me that WAFL is not a file system (Kostadis, Geert, are you reading this?) when every NetApp white paper on the NetApp website, including one just published in July 2009, still refers to WAFL as a file system - http://media.netapp.com/documents/wp-7079.pdf.  Logically, why would you insist your competitors accept your point when you haven't even convinced your own company?

  2. Telling me it is 'dangerous' for a competitor to even attempt to accurately performance test another vendor's array, when NetApp has actually gone to the extent of publishing two SPC benchmarks on EMC arrays. Okay, maybe 'illogical' is not the right word here - perhaps 'contradictory' would have been more precise. But then again, wouldn't you think it illogical to state an obvious contradiction in a public blog.  I mean, the idea of a debate is to win the argument, not hand your competition a stick to beat you with. Note to NetApp bloggers: I am not threatening to beat NetApp employees with a stick. 

  3. Claiming in their 21 page Wyman/Mercer cost-of-ownership white paper that after a thorough and meticulous analysis of EVA, CLARiiON, and DMX usable capacity, it was found that all those arrays used exactly the same amount of usable capacity for a 4TB database, down to the tenth of a terabyte (and by the way, the number NetApp came up with in its painstakingly precise calculation was 30.7TB for each, as opposed to their own 15.0TB for a FAS system.) If 'illogical' is not the right word here, which word would you prefer? Would you find 'ridiculous' less offensive?


But my point is: When your claims are illogical, it's illogical to take offense. Rather, reworking your arguments and getting back into the game is the best option. Also, I think everyone realizes that being illogical and lying are two entirely different things..


As far as blogging is concerned, I consider myself one of the least thin-skinned people you'll ever blog with. Any tendency towards hyper-sensitivity was beaten right out of me during six years in the Marines. When someone now tells me that 'my claims are illogical', I don't get personally worked up about it. In fact, I find myself marveling at their gracious language and self-restraint. Heck, I didn't even get angry when a NetApp blogger published one of my HP Confidential slides and called it 'nonsense' and 'dipstickery' (see this post).


So, here is my final piece of advice to my honorable NetApp colleagues: Lighten up, guys! Nobody in the blogging world minds a well phrased repartee now and again, but all this teeth-grinding is so Cold War. Within the industry, you're the only bloggers I know that carry on the way you do. Your company's doing well. Relax. Engage in the blogs if you feel so moved, but try to have a good time while you're doing it.  


Best regards,


Jim


Tweet this! 

Labels: NetApp| storage| WAFL

Making Sense of WAFL Part 4

By Karl Dohm, HP Storage Architect

 

To recap, in this series of posts we are exploring some of the limitations of WAFL and specifically how those limitations manifest themselves to an average user of the FAS.  For my previous posts see Part1,  Part 2 , and Part 3

 

I had made the assertion in the original post that some of the competitive disadvantages on the FAS involved throughput, capacity utilization, and ease of management.  I had just started on the throughput part of the discussion when Konstantinos Roussos of NetApp posted that I was off the mark and cited the Avanade paper as one piece of evidence to prove it.  So in the interest of fairness, I decided to try and understand what the Avanade paper was trying to say.

 

One of the initial tests described in the paper was rather simple to set up, and it was described by the experts at Avanade as one to "assess the overall performance of the FAS3050".  I'm always looking for something relatively simple to set up that assesses the overall performance of an array, so this captured my interest.  Since I didn't come up with the test, there shouldn't be any confusion that I somehow biased the selection to make NetApp look artificially unfavorable.   Hopefully we can agree that this test has value, while not perfect, and has been arrived at in a fair manner. 

 

My various recent blog posts have focused on trying to recreate the test result in the Avanade paper outside of the Avanade environment.  The main purpose of this iteration, at least for me, is to arrive at some way to simply and fairly have a basis for comparison. 

 

We still have some differences in how we run the tests, but for the most part these shouldn't matter much.  Patrick's first and second post on the Avanade advisor blog describes some test details that augment the original Avanade paper.  Here the test is run with 4Gb FC SAN connected to a 32 bit Windows 2003 Server on a physical machine.   Patrick has evolved the test a bit as he is running on a 64 bit Windows 2008 Hyper-V virtual machine using the iSCSI stack.  It's not practical to switch over to the environment he used, so I have to assume the differences are not significant to the outcome.  I think there is some value though in minimizing variables and staying with a simpler stack between IOmeter and the LUN.

 

Everything else possible I can think of is being matched up, including spindle count of 20 in aggregate (root volume in different aggregate), raid group size of 20, rotational latency, raid type of LUN, IOmeter profile, IOmeter file size and so on.  See the end of the post for more detailed configuration data. 

 

Since the FAS3050 model used in the original test is no longer a current product, Patrick suggested using a FAS3070 or a FAS2050.   It turns out that's good advice because, even with the new info, I could not get the 3050 results to come close to the Avanade results.  I expected the same outcome on the FAS2050, but was in for a surprise.

 

It turns out that I get better throughput on the FAS2050 than Patrick did.  For those of you thinking I'm spinning data, please open your minds and read this carefully. 

 

For the LUN with no fragmentation, in our environment the FAS 2050 runs at about 4860 IOPs and 53 MB/s, which are average values taken over the first 20 minutes after full reallocation.  Patrick's results maxed out at about 3460 IOPs and 39MB/s - which he said was also against the LUN with no fragmentation.   My result was nearly 40% faster.  So if you took that data by itself, this would be a great endorsement of the FAS by a competitor. 

 

But there is more to the story of course.  This is a random workload, and as the LUN fragments the throughput degrades.  I ran 14x 20 minute segments of this load (about 4.5 hours) after an initial reallocate and the results for the FAS2050 are shown below.

 

20 min segment IOps MBps Avg Resp Time
1 4864 52.7 30.8
2 4494 48.6 33.4
3 4296 46.5 34.9
4 4141 44.8 36.2
5 4014 43.4 37.3
6 3896 42.2 38.5
7 3812 41.2 39.3
8 3714 40.2 40.4
9 3676 39.8 40.8
10 3616 39.1 41.5
11 3576 38.7 41.9
12 3533 38.2 42.4
13 3499 37.9 42.9
14 3449 37.3 43.5

 

The results Patrick reported were closer to where the FAS ended up after fragmentation took its toll.  So I'm guessing he may have not reallocated his volume in between each IOmeter run, meaning the LUN was in fact fragmented when he got to the higher thread counts.  Anyway, his data matches nearly perfectly with my post-fragmentation data in the 14th segment:  3460 IOPs at Avanade vs 3450 here.

 

Note that I ran 150 threads in a single IOmeter worker, which was what it took to get an approximate 30msec average response time at the start of the test.  The 30msec average figure is somewhat standard to use in the industry.

 

Let's compare these results to the EVA4400.  My understanding is that these two arrays (FAS2050 and EVA4400) are often sold against each other in competitive situations, so it makes sense to see which array has an edge.

 

IO Test

 

The EVA4400 runs this same test with a throughput of nearly 6000 IOPs and 64MB/s, which doesn't degrade over the course of the 14x 20 minute segments.  The EVA LUN doesn't fragment, so the throughput remains roughly the same in each 20 minute segment. 

 

Summarizing, the FAS LUN throughput is 81% of the EVA LUN when it is not fragmented and 58% of the throughput when it is fragmented.  Since the FAS LUN can't avoid fragmenting with this load, the 58% figure is the more relevant one for a typical situation.   Note that the FAS LUN has not yet settled to a steady state after these 14 segments, so its numbers will be somewhat lower if this test runs longer.  

 

Ok, that's a lot to digest, and given the outcome there will certainly be criticisms.  But, it's a relatively simple test that can be recreated by anyone out there; it's a test that I didn't choose; the test is said by a Windows integration expert to be a measure of overall array performance, and its pretty clear that there is an EVA advantage.   

 

By the way, in case it's thought that the EVA just got lucky with this choice of test, I'm open to trying any other IOmeter access specification that anyone wants to run, as long as the test is at least somewhat relevant to customer workloads and stays reasonably simple to set up and understand.

 

Finally, it deserves mentioning that this thread has a tie to the associated topic of cost and capacity utilization.  This is mostly a topic for another day, but the EVA can be deployed without any additional spindles over the 20 used in this test.  The FAS, if running in a cluster (which is redundancy comparable to EVA) would need at least one additional global spare spindle on the a side of the cluster, three additional spindles to hold the root volume of the b side of the cluster, and one global spare for the b side of the cluster.  That's an overhead of five additional spindles to support an array with 20 data spindles.

 

You may be able to sense where I'm going with this.  Five spindles incur roughly an additional $5000 in purchase price for the drives alone, and with those five you would need additional drive shelves, cabling, rack space, power, and cooling.  Not sure how many dollars that all that adds up to, but some would call that a significant additional cost burden for an array that provides 58% of the throughput. 

 

Additional configuration details for runs done here

 

FAS2050

 


  • 20 spindle aggregate containing 1x 1TB volume
  • 1TB volume containing 1x 1TB LUN
  • 2 front end FC ports per FAS filer
  • 150 threads in IOmeter (~30mseg response time at start of test)
  • MPIO policy round robin
  • Ontap 7.2.4L1

 

EVA4400

 


  • 20 spindle disk group containing 1x 500GB LUN
  • 170 threads in IOmeter (~30 msec response time)
  • 2 front end FC ports per controller
  • 09004000 firmware

 

Common

 


  • 15K 144GB drives
  • IOmeter writes to a 100GB file placed in the lowest LBAs of the LUN
  • IOmeter access specification as in Avanade post
  • Emulex dual ported HBA
  • Emulex Queue depth = 254, Queue target=0
  • Windows 2003 32 bit server
  • MPIO and vendor specific DSM - SQST for EVA and RR for FAS
  • 4GB FC SAN
  • Proliant DL380-G4

(Editor's note: Fixed broken links resulting from moving to a new blog platform - no content of the post was changed.  14 April 2011)

Labels: EVA| NetApp| storage| WAFL
Search
Showing results for 
Search instead for 
Do you mean 
Follow Us
Featured


About the Author(s)
  • 25+ years experience around HP Storage. The go-to guy for news and views on all things storage..
  • This profile is for team blog articles posted. See the Byline of the article to see who specifically wrote the article.
Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.