Technical Support Services Blog
Discover the latest trends in technology, and the technical issues customers are overcoming with the aid of HP Technology Services.

DPTIPS: StoreOnce Back to Basics – Why More Than One VTL?

For those who are involved with StoreOnce appliance integrations utilizing Virtual Tape Library (VTL) emulation, it is very important to remember the fundamentals documented in the best practices guide. The focus of today’s tip is the rationale behind creating more than one VTL per service set.

 

Three things drive the desirability of multiple VTLs. First, you’re already paying an ingest performance penalty for the inline dedupe. Second, each VTL is its own dedupe domain. Third, you’ll need several parallel streams to realize maximum ingest.

 

Inbound streams are split into chunks of about 4 KB in size. Each chunk is passed through a high-speed hashing algorithm which generates a hexadecimal key statistically unique to that chunk. Each key is compared to a database of keys representing every currently stored chunk of data. If the inbound chunk is a duplicate, the 4 KB of data is thrown away, and a small pointer is put in its place referencing the identical, already-stored chunk. If the inbound chunk is unique, the 4 KB is passed through to the backend store, and its key is added to the database. That’s a lot of processing to achieve at wire speed even with the generous resources found in StoreOnce appliances. What can we do to make up the cost? Keep that question in mind as you consider that ...

 

Each VTL maintains its own hash key database, thus we treat each VTL as its own “dedupe domain”. With very little exception, if you create one large VTL and feed it every sort of data, your dedupe database will grow quite large. As a result, the time required to determine if any new chunk of data is a duplicate will degrade to an unacceptable level. BUT, what if you “stack the deck” and artificially improve the odds of finding duplicates? And what if, in so doing, you also reduce the size of the dedupe domain? Now combine those data points with the concept that ...

 

What the ISV backup app sees as a tape drive we of course know is a Linux process pretending to be a tape drive. As such, each tape emulation process in and of itself is only capable of a finite number of I/O operations per second (IOPS). The good news is that StoreOnce appliances have the resources to support many, many of these tape emulation processes in parallel. The takeaway here is that numerous parallel streams are required to get anywhere near an appliance’s rated ingest. This does come with a caveat, however. You really don’t want more than a dozen tape processes hammering away at any one dedupe database.

 

Now let’s connect the dots.

 

A new StoreOnce customer has Exchange backups, SQL backups, Fileserver backups, and OS backups. We’re going to create four VTLs:

  1. EXCH_VTL,
  2. SQL_VTL,
  3. FILE_VTL, and
  4. OS_VTL

Each will have no more than twelve (12) virtual tape drives. We’re also going to make sure we have plenty of objects in our backups. Finally, we’re going to make sure our backup destinations are chosen to segregate data by type per VTL. Putting all of our information together, what have we achieved?

  • Our data is split across multiple VTLs, so we have smaller dedupe domains and faster hash comparisons.
  • Chunks of data in each dedupe domain are far more likely to be duplicate because they are of the same type. More dupes = faster ingest = less physical space consumption.
  • We have limited our VTLs to a maximum of 12 simultaneous inbound streams, so the risk of dedupe database thrashing should be minimized.
  • We have enabled a maximum of 48 parallel inbound streams. Hopefully we have enough objects in enough concurrent backup sessions to at least get north of 20 streams at one time.

We are backing up the same data as we were but in a way that will maximize StoreOnce performance in terms of ingest and dedupe efficiency.

 

There are of course other major factors affecting the success of StoreOnce integration with Data Protector.  Those were covered in the original article and will likely be topics for expanded future discussions. Also, there are a few corner cases where better dedupe is achieved with only one general-use VTL, but those are typically very modestly sized implementations and not frequently encountered.  Still, you should always weigh level of complexity versus empirical results.

 

Y'all come back now, ya hear?  :smileyhappy:

Labels: Data Protector
Comments
Alex_B | ‎11-06-2013 08:10 AM

Hello, 

thanks for a great articole. 

I have a few questions related to StoreOnce and client side deduplication. In case that i chose to do the client side deduplication where is the index file going to be held? Is it going to be located locally on the system ( which doesnt make any sense because if you lose the whole client you lose the index file as well ) or it is going to be on the StoreOnce appliance and comparing will be done via the wire? And second thig, if deduplication is set up on the client side than the deduplication will happen twice since the StoreOnce will try to deduplicate all data that is getting.

 

Thanks 

Mr_T | ‎11-07-2013 04:13 AM

Hi Alex,

 

I appreciate your kind comment and welcome your question.  Client-side dedupe with StoreOnce is achieved by having a Media Agent (MA) on the client.  The dedupe database (hash key store) remains with the VTL on the G3 appliance.  The MA on the client chunks the data and generates a hash for each chunk.  These hashes are forwarded in batches to the appliance which responds with a list of unique chunks that need to be sent over.  No index of any sort remains on the client, and no further dedupe takes place on the appliance.  (Unless you want to count hardware compression.  Each node has a comp/decomp card that squeezes the unique chunks going to the backend store.)

 

Best regards,

Jim (Mr_T)

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
About the Author
Mr_T? Yes, but in name only. No mohawk or gold chains. In real life I'm Jim Turner, a Master Technologist with HP's Integration and Techn...
About the Author(s)
  • I graduated in Software Engineering. Joined HP family five years ago, I deliver Insight Remote Support technical consulting for HP customers, in North America, Canada and Latin America. Assist setting up, installing and configuring the solution in customers' IT environments.
  • I am an identical twin. My brother’s name is Greg Tinker and we have been extremely fortunate working similar careers within HP, known to our HP colleagues and many of our customers as "The Tinkers". Our job is to be the technical lead on major business operational outages with millions of Dollars/Euros hanging in the balance. We both have a complete background in architectural, Infrastructure and application environments from both the proactive and reactive side of HP Enterprise Service (HP ES), and HP Enterprise Business (HP EB).
  • I am an identical twin. My brother’s name is Chris Tinker and we have been extremely fortunate working similar careers within HP, known to our HP colleagues and many of our customers as "The Tinkers". Our job is to be the technical lead on major business operational outages with millions of Dollars/Euros hanging in the balance. We both have a complete background in architectural, Infrastructure and application environments from both the proactive and reactive side of HP Enterprise Service (HP ES), and HP Enterprise Business (HP EB). We have always attended the same schools, studied the same material (big surprise, as we are identical twins), and have always worked as a close team and strive to demonstrate our teaming ability’s to others. We each have more than 11 years experience supporting mission-critical enterprise customers on a broad range of technologies. We’ve both won the HP MVP award multiple times as well as coauthored books, programs, and whitepapers in our spare time.
  • More than 25 years in the IT industry, managing ITSM, service development and delivery projects in Technology Services. Specialized in end2end support for ISV based business solutions. Certified ITIL and project management expert.
  • Eduardo Zepeda, WW TS Social Media Program Manager & Internal Communications for WW Technology Services Blogging on behalf of HP Technology Services (TS_Guest)
  • I have been with HP for 13 years, always in Services - first as a Services Channel Sales rep, then a Channel Services Segment Manager, and now, in WW Technology Services Marketing. These may be my formal job titles, but I'm really a Cheerleader for HP Services! I feel that HP has great services, exceptional Technical Experts and Delivery teams, and so many cool things are going on at HP Services. So, stay tuned...
  • Mr_T? Yes, but in name only. No mohawk or gold chains. In real life I'm Jim Turner, a Master Technologist with HP's Integration and Technical Services team. I leverage my 27 years of system, storage, and networking experience to ensure optimal performance with Data Protector and StoreOnce technologies. Many times the source of a problem is not where it appears to be. A broader view of the landscape and a deeper knowledge of system internals are frequently required, and that's precisely what I deliver.
  • MrCollaboration (aka Jim Evans) is an HP Global Services Alliance Manager. He has worked in the IT industry for more than 30 years, 22 of which were spent with Digital Equipment Corporation, Compaq and HP. He works with many third party vendors and partners to develop processes to facilitate excellent support and service for mutual customers. Jim is also HP’s representative to the Technical Support Alliance Network (TSANet).
  • I've been working in Customer Service for over 20 years. During my career I've provided support services for Languages, Programming Libraries and Operating Systems. During the last 10 years I've provide support for Linux and more recently VMware. My current role is as a Technical Account Manager working in the HP Custom Mission Critical Services Industry Standard Operating Systems team. I provide both reactive and proactive operating system support for proLiant servers and blades. Our services in the Custom teams are built on statement of work contracts for large HP customers who need a customized mission critical support offering.
  • I've been working in HP since 2007 like IT agent, developer, Web designer and then like Web Project Manager
  • I like to listen as much as I like to talk. Why? My 25+ years in the technology industry has taught me that the key to delivering value to customers is to understand what they value in the first place! I developed this passion for customers and consultative selling during my 12 years with Accenture, and I have continued to approach customers in a consultative way during my 12+ year tenure with HP. I also have a passion for HP given my knowledge of our Product and Service Portfolio and the differentiators we possess that position us as a leader in the areas our customers are telling us they want to go. Converged Infrastructure, Converged Cloud, Big Data – and the associated Service and Support implications – all such exciting technology trends where our success will hinge upon our ability to differentiate ourselves versus others in the areas that matter most to our customers. Right up my alley, and I am proud to be part of the great HP team where I know we have the best solutions in the industry!
  • Tom Clement has over 30 years experience in the areas of adult learning, secondary education, and leadership development. During this time Tom has been a consistent champion of “non-traditional” training delivery methods, including blended learning, virtual delivery (self paced and instructor led), the use of training games and simulations, and experiential learning. Tom has spent the past 25 years of his career at Hewlett Packard, focused most recently on HP’s global Virtualization, Cloud, and Converged Infrastructure customer training programs. Tom manages the strategic direction and overall performance of these training programs, ensuring these worldwide programs help HP’s customers capitalize on the business opportunities made available by IT advancements in each of these subject areas. Tom and his global teammates utilize best in class instructors, course content and supporting equipment infrastructure to deliver these training programs to HP’s customers. The team prides itself on providing the Virtualization, Cloud, and Converged Infrastructure content customers need when and where they need it, anywhere in the world. Tom is based in the Washington, DC suburbs and can be reached at tom.clement@hp.com.


Follow Us
Top Kudoed Posts