Around the Storage Block Blog
Find out about all things data storage from Around the Storage Block at HP Communities.

HP Announces Deduplication - Part 1

-By Jim Hankins


Earlier today, HP announced new deduplication capabilities for customers who are considering deploying disk-based backup or virtual tape as part of their data protection processes. Deduplication is one of the most talked about new technologies in the storage industry today as customers continue to look for innovative ways to protect the ever growing amounts of data in their IT environments.


However, in talking with customers we found that there are very different needs for disk-based backup and deduplication depending on whether the customer wanted to use the technology in a larger scale "data center" type installation or in a smaller scale "office" type installation. Because of these very different needs, HP is offering its customers two different deduplication technologies.


First, HP is making available by license accelerated deduplication for our HP StorageWorks Virtual Library Systems. Our VLS products with accelerated deduplication technology are uniquely scalable for large data centers where both high performance and high capacity are required.


Second, HP is introducing two brand new products, the HP StorageWorks D2D2500 Backup System and the HP StorageWorks D2D4000 Backup Systems with dynamic deduplication technology. Dynamic deduplication was developed by HP specifically for smaller environments where low cost and ease-of-use are key customer needs. Dynamic deduplication is a built-in feature on the D2D2500 and D2D4000, rather than by licensed option.


For more information about the above products please see our announcement page at: www.hp.com/go/deduplication


One of the most frequent questions we heard from customers that we talked to about deduplication prior to our announcement was, "So what kind of deduplication ratios can I expect to get with HP's deduplication technologies?" We've done some internal testing that has shown it's possible to reach at least a 50:1 deduplication ratio, but the ratio that you will achieve in your environment depends on a number of variables. You may hear some other vendors quoting deduplication ratios that are much larger or smaller, but it all depends on a number of factors.


One of those factors is the type of data that the deduplication process is being applied against. Some data types lend themselves to being better candidates for deduplication than others. As an example, data from a PACS (Picture Archiving and Communication System) used in X-rays and other medical imaging will have very little duplicate data so the ratio would usually be quite low. In another example, a database, where there may be many records with empty fields or the same data in the same fields, would typically be a good candidate and could produce very high deduplication ratios.


Other factors to consider are what is your backup policy and the daily change rate of your data? Are you doing daily full and weekly full backups? Or are you doing daily incremental and weekly full backups? Is the daily change rate of your data 1%, 2% or even more? It is important to remember that the less your data changes the more benefit you'll see because over time the deduplication engine will see more and more of the same (duplicate) data during the backups.


Lastly, how you measure deduplication is important to the overall ratio. Are you measuring the deduplication ratio of just your last backup to the previous backup? Are you measuring the ratio over the aggregate of all backups stored? Or is the measurement somewhere in between?


Another word of caution here, some might think that deduplication means that you can buy a smaller disk- based backup system, but be aware that it may take many backups over a long span of time to yield substantial deduplication ratios. Initially, the amount of storage you buy with your disk-based backup or virtual tape product needs to be sized correctly to reflect your existing backup tape rotation strategy and expected data change rate within your environment.


HP believes that the various deduplication technologies in the industry are going to deliver relatively the same ratios, so it's much more important to consider other features such as the scalability, cost and ease-of-use of competing technologies.


In part 2, I will take a look at these other features more closely.

Search
Showing results for 
Search instead for 
Do you mean 
Follow Us
Featured


About the Author(s)
  • 25+ years experience around HP Storage. The go-to guy for news and views on all things storage..
  • This profile is for team blog articles posted. See the Byline of the article to see who specifically wrote the article.
Labels
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.