Technical Support Services Blog
Discover the latest trends in technology, and the technical issues customers are overcoming with the aid of HP Technology Services.

DPTIPS: StoreOnce Back to Basics – Why More Than One VTL?

For those who are involved with StoreOnce appliance integrations utilizing Virtual Tape Library (VTL) emulation, it is very important to remember the fundamentals documented in the best practices guide. The focus of today’s tip is the rationale behind creating more than one VTL per service set.

 

Three things drive the desirability of multiple VTLs. First, you’re already paying an ingest performance penalty for the inline dedupe. Second, each VTL is its own dedupe domain. Third, you’ll need several parallel streams to realize maximum ingest.

 

Inbound streams are split into chunks of about 4 KB in size. Each chunk is passed through a high-speed hashing algorithm which generates a hexadecimal key statistically unique to that chunk. Each key is compared to a database of keys representing every currently stored chunk of data. If the inbound chunk is a duplicate, the 4 KB of data is thrown away, and a small pointer is put in its place referencing the identical, already-stored chunk. If the inbound chunk is unique, the 4 KB is passed through to the backend store, and its key is added to the database. That’s a lot of processing to achieve at wire speed even with the generous resources found in StoreOnce appliances. What can we do to make up the cost? Keep that question in mind as you consider that ...

 

Each VTL maintains its own hash key database, thus we treat each VTL as its own “dedupe domain”. With very little exception, if you create one large VTL and feed it every sort of data, your dedupe database will grow quite large. As a result, the time required to determine if any new chunk of data is a duplicate will degrade to an unacceptable level. BUT, what if you “stack the deck” and artificially improve the odds of finding duplicates? And what if, in so doing, you also reduce the size of the dedupe domain? Now combine those data points with the concept that ...

 

What the ISV backup app sees as a tape drive we of course know is a Linux process pretending to be a tape drive. As such, each tape emulation process in and of itself is only capable of a finite number of I/O operations per second (IOPS). The good news is that StoreOnce appliances have the resources to support many, many of these tape emulation processes in parallel. The takeaway here is that numerous parallel streams are required to get anywhere near an appliance’s rated ingest. This does come with a caveat, however. You really don’t want more than a dozen tape processes hammering away at any one dedupe database.

 

Now let’s connect the dots.

 

A new StoreOnce customer has Exchange backups, SQL backups, Fileserver backups, and OS backups. We’re going to create four VTLs:

  1. EXCH_VTL,
  2. SQL_VTL,
  3. FILE_VTL, and
  4. OS_VTL

Each will have no more than twelve (12) virtual tape drives. We’re also going to make sure we have plenty of objects in our backups. Finally, we’re going to make sure our backup destinations are chosen to segregate data by type per VTL. Putting all of our information together, what have we achieved?

  • Our data is split across multiple VTLs, so we have smaller dedupe domains and faster hash comparisons.
  • Chunks of data in each dedupe domain are far more likely to be duplicate because they are of the same type. More dupes = faster ingest = less physical space consumption.
  • We have limited our VTLs to a maximum of 12 simultaneous inbound streams, so the risk of dedupe database thrashing should be minimized.
  • We have enabled a maximum of 48 parallel inbound streams. Hopefully we have enough objects in enough concurrent backup sessions to at least get north of 20 streams at one time.

We are backing up the same data as we were but in a way that will maximize StoreOnce performance in terms of ingest and dedupe efficiency.

 

There are of course other major factors affecting the success of StoreOnce integration with Data Protector.  Those were covered in the original article and will likely be topics for expanded future discussions. Also, there are a few corner cases where better dedupe is achieved with only one general-use VTL, but those are typically very modestly sized implementations and not frequently encountered.  Still, you should always weigh level of complexity versus empirical results.

 

Y'all come back now, ya hear?  :smileyhappy:

Labels: Data Protector
Comments
Alex_B | ‎11-06-2013 08:10 AM

Hello, 

thanks for a great articole. 

I have a few questions related to StoreOnce and client side deduplication. In case that i chose to do the client side deduplication where is the index file going to be held? Is it going to be located locally on the system ( which doesnt make any sense because if you lose the whole client you lose the index file as well ) or it is going to be on the StoreOnce appliance and comparing will be done via the wire? And second thig, if deduplication is set up on the client side than the deduplication will happen twice since the StoreOnce will try to deduplicate all data that is getting.

 

Thanks 

Mr_T | ‎11-07-2013 04:13 AM

Hi Alex,

 

I appreciate your kind comment and welcome your question.  Client-side dedupe with StoreOnce is achieved by having a Media Agent (MA) on the client.  The dedupe database (hash key store) remains with the VTL on the G3 appliance.  The MA on the client chunks the data and generates a hash for each chunk.  These hashes are forwarded in batches to the appliance which responds with a list of unique chunks that need to be sent over.  No index of any sort remains on the client, and no further dedupe takes place on the appliance.  (Unless you want to count hardware compression.  Each node has a comp/decomp card that squeezes the unique chunks going to the backend store.)

 

Best regards,

Jim (Mr_T)

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Mr_T? Yes, but in name only. No mohawk or gold chains. In real life I'm Jim Turner, a Master Technologist with HP's Integration and Techn...
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.