Deduplication, online storage, and cannibals

by on 08-12-2008 01:06 AM

By Warren Smith,  StorageWorks Competitive Intelligence Manager


"Every fight is a food fight when you're a cannibal."


That's an amusing quote by Demetri Martin, an American comedian.  But it could be the motto of NetApp.   Last week, NetApp announced that customers could use their V-series storage to dedupe primary storage of other storage manufacturers, including EMC, HDS, and yours truly, HP StorageWorks.  Customers should ask themselves, "Is this a dinner invite from a cannibal?"


Without further NetApp explanation of how this deduplication facility would be realized in other vendor storage, what configuration constraints apply or any other substantive details, NetApp made the unilateral claim.   The cynic in me wants to think that NetApp made this announcement because the advanced single instance storage (ASIS) deduplication in their storage works sub-par in their storage and they honestly wanted to offer customers the advice to try to use ASIS in someone else's storage, on the chance that it might perform better.  But seriously, the cannibal instinct of some storage vendors typically seeks to propagate their technology problems across all possible systems and eat into sound working technology.


Can I share the reason that this offer from NetApp is not a good idea? And it is not because it comes from NetApp.  This idea is not a good idea because applying deduplication in primary storage is injecting an invasive process and performance impacting process into the heart of your business operations.  The ASIS deduplication process necessarily takes valuable compute cycles from the high priority business application processing that drives many businesses.  NetApp appropriately warns customers about the performance impacting effect of ASIS. 


The following quotes are taken directly from NetApp's Deduplication Best Practices section in the text referenced below:




  •  "If there is very little new data, run deduplication infrequently, because it doesn't make sense to unnecessarily consume CPU resources."


  • "Use the auto mode so that deduplication only runs when significant additional data has been written to each particular flexible volume"


  • "Stagger deduplication schedules for the flexible volumes so it runs on alternative days."

Source: Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide, TR-3505. Network Appliance, Inc. 16 April 2008.


And for sure, NetApp wants customers to know that there are limitations to the ASIS deduplication functionality.  In an article on the ChannelWeb website, Chris Cummings, senior director of data protection solutions for NetApp, is quoted:


"In order to activate the dedupe license, which is free, customers do need to spend about 10 minutes filling out a form that states that they know there is a chance of performance degradation when implementing the technology, depending on data type and other factors", Cummings said.


Source: http://www.crn.com/storage/209901632


Despite these vendor advisories to be careful with the use of ASIS, some customers may think, "That's OK, I can just run my dedupes at midnight."  But we have seen testing data that the performance impacting effects of ASIS can also be experienced even later when the deduped volume is used in the normal applications operations.  And for the many storage customers desperately seeking to reduce their data duplication, there is good news. 


The intelligent place to perform data reduction is in secondary and lower tier storage and during backup operations, and away from primary storage.   HP's Deduplication Strategy provides for just that, intelligent data reduction in the backup regime.  Our Deduplication Strategy, which offers "Accelerated" Deduplication for enterprise customers and "Dynamic" Deduplication for smaller businesses, is also complimented by storage space efficiency designs in the HP StorageWorks EVA that include Dynamic Capacity Management, implemented via EVA Software.


And so, it is clear that the marketplace has a keen interest in data reduction methods and technologies that hold the possibility of reducing the volume of data under management in customer storage systems.  It is also clear that some vendors have sought advantage for themselves with this marketplace phenomena and will continue to seek to capitalize on their untested hype and promises that are made with regard to their products.    And remember, never accept an invitation from a cannibal for dinner.

We encourage you to share your comments on this post. Comments are moderated and will be reviewed and posted as promptly as possible during regular business hours.

To ensure your comment is published, please follow our community guidelines.

Comments
by Anonymous(anon) on 08-12-2008 03:47 PM

For all online backup, file sharing and storage related info, I recommend this website:

http://www.BackupReview.info

by Anonymous(anon) on 08-26-2008 09:26 AM

Hi Warren, here is a little more background on NetApp dedulpication:  We've delivered dedupe on over 13,000 NetApp FAS systems in a little over a year.  It has surprised us all to see just how fast this feature has taken off.  Maybe even more surprising is how fast our customers have gravitated towards primary storage apps.  VMware VM's and File Services (aka /home dirs) are leading this charge.  We estimate over 50% of our users are currently dedupe-ing primary storage.

You are correct when you say that we caution users to proceed slowly with high performance apps.  Deduplication, like any other system process, does consume resources.  But what our customers repeatedly tell us is they see no difference in system performance before, during, and after deduplication.

With the recent release of deduplication on V-Series, we look forward to hearing more dedupe success stories from our HP users.

Larry Freeman

by Anonymous(anon) on 03-12-2009 07:34 PM

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Alessandra

http://www.craigslisttool.info

by Anonymous(anon) on 03-13-2009 07:48 AM

Hi Alessandra,


I'm glad you are finding value in the blog.  Do stop back.  I also went to your blogsite you listed - what's the connection you have to storage?  

by Anonymous(anon) on 10-06-2009 06:45 AM

I can't believe anyone finds value in a blog about a product, written by an author who is biased by a competing product. I am hoping most people are able to see through this propaganda.

by on 10-07-2009 12:38 AM

Hi Issac - since you left only your first name, I have no idea where you are coming from.  For all I know, you could work for a competitor.  That said, you are entitled to your opinion and I respect it yet disagree.  I'll readily admit that our blog is biased - so are the many blogs that our competitors' write.  What I think our readers find valuable are the open debates that we have here as well as the discussions we have about our view of storage.  


Everyone is biased - that doesn't make what they say propaganda - it means the information is biased.  Biased also doesn't mean that the information is inherently wrong or even misleading.  If you really knew HP and our standards, you would not leave comments calling the blog propaganda.  


Thanks for stopping by... Calvin

by Anonymous(anon) on 10-17-2009 07:57 AM

I cannot agree with you more!

We had a very painful experience with NetApp Asis; the admins did not read the fine print and just enabled the dedupe "to save space". After a month, the NFS latency hiked up to 50k ms renders filer completely useless. After numerous perfstat collections, the NetApp  tech support still had no clue. It is one of our senior storage admin who spot the problem.

We saw the huge performance penalty on the engineering data.. it took 14 days to finish one run of asis and while asis is running,  it consumed more than 30% of CPU cycles. And because of the nature of the data (high turn over rate), it really is a terrible idea to run dedupe. The second lesson we learned was.. even on a volume with moderate modification, there are still price to pay for large sequential reads. Just as the best practice ( tr3505) has suggested if one is sensitive to read/write performance, one should be cautiousl with dedupe. There is no free lunch.

by Anonymous(anon) on 03-31-2010 02:59 PM

Online backup is the safe way to go these days . Your files will securely be stored on a remote server .We now have many online file backup companies even providing free backup space of more than 10GB. Most of them can be found on this site www.free-file-backup.com .

by on 04-01-2010 02:25 PM

@Forcha - saying that online (remote) backup is the safe way to go these days is a bit like saying riding a bike is the way to go.  It would really depend on how far you are going.  And so it is with backup as well.  I'd say that for 95% of our customers, using a backup service to a remote site is not the way to go.  They have too much data to make that cost effective or to keep a realistic RTO (recovery time objective) if they did have a failure.  Heck, for my own home network of two laptops, two desktops, and a Mini PC, I won't consider online backup - I'm using an HP StorageWorks X510 Data Vault.  An online backup service can make sense for some customers but it's the minority of our customers.  Thanks - Calvin

Post a Comment
Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.

Find HP in Social Media

Facebook Twitter YouTube SlideShare Flickr
About the Author
  • 25+ years experience around HP Storage. The go-to guy for news and views on all things storage..
  • This profile is for team blog articles posted. See the Byline of the article to see who specifically wrote the article.
Labels