Technical Support Services Blog
Discover the latest trends in technology, and the technical issues customers are overcoming with the aid of HP Technology Services.

DPTIPS: Large Objects - Divide and Conquer with Data Protector

By default, Data Protector assigns one Disk Agent (VBDA) to each object.  An object is considered to be a Windows volume (drive letter) or Unix mountpoint (filesystem).  But what if your object is really large?  Isn't there some way to get Data Protector to assign multiple agents with instructions to "divide and conquer" such a large object?  With a little cleverness from you, the backup administrator, yes there is.

 

We'll work with a fairly contrived example, but the technique is sound.  Consider the following directory tree as seen in the process of creating a new Data Protector backup specification.

 

DnC_Dir_Tree.jpg

 

For our example, let's say we've determined the following:

  • The E: drive is about 800 GB in total.
  • Three-fourths of that space is in E:\Users.
  • There is a fairly even distribution between Curly, Larry, and Moe.
  • Shemp has only a small bit of data by comparison.

Accordingly, we would probably want four Disk Agents working on unique "chunks" of the E: drive as if they were separate objects.  But how?  Easy.  We just make those chunks separate objects!  The key is using a lesser-known right-click option when configuring your backup source.

 

First, the plan.  Based upon our knowledge, we want the following separate chunks as objects.

  1. Just E:\Users\Curly
  2. Just E:\Users\Larry
  3. Just E:\Users\Moe
  4. All of E: except for Curly, Moe, and Larry in E:\Users

Seeing the initial stages of this technique should raise a concern regarding the addition of new directories on the client after the backup is created and in use.  I promise to address that contingency further along, so just stick with me for now.

 

With the existing appearance (hint hint!) of the client "fanta", we're going to select our first "chunk".

 

DnC_Dir_Curly_Include.jpg

 

Note that the checkmark for "Users" is gray indicating a partial selection beneath.  This will be important later.  The task at hand now is to get a second appearance of the client so we can select our next chunk.  Here's the less-than-obvious part.  I'm almost embarrassed to admit that even I didn't find it for years.  Little did I realize it was just a right-click away.

 

DnC_Dir_Right_Click_Copy.jpg

 

Do that three times, and guess what you have?  Four appearances of the very same client!

 

DnC_Dir_Four_Copies.jpg

 

I think you may see where we're going from here, but the plot has a twist at the end.  So don't rush off just yet!  We've accounted for Curly's directory.  Next, we want to use the second appearance of fanta to select Larry's folder.

 

DnC_Dir_Larry_Include.jpg

 

Third on our list is Moe's directory.  Accordingly, we'll use the third client appearance for our last "include" selection (another clue!).

 

DnC_Dir_Moe_Include.jpg

 

So far, so good.  We've accounted for three of our four chunks.  But now we face the critical challenge of selecting everything else yet not creating a static condition where future directory additions to E:\ are skipped unless you modify the backup specification.  Here's where the order of operation is critical.

 

To create the desired syntax in the datalist, first select the whole E: drive in the last appearance of your client, then carefully unselect the folders Curly, Larry, and Moe that were included in the three previous appearances of the client.

 

DnC_Dir_Exclude.jpg

 

This sets up the syntax that is essential for acting as a "catch-all" to grab what you want that's visible right now plus anything new that might be added in the future.  Note that the check mark for the E: drive is now turquoise indicating a full select with some excludes versus our previous gray check marks which signify only partial, specific selections.  Our backup specification (aka: datalist) is saved as Fanta_Divide_Conquer.  Having a look now at the WINFS objects therein, I think the point I'm trying to make will be eminently clear.

 

DnC_Datalist_Before.jpg

 

The first three objects back up only the specified directory tree for each.  The last object -- our "catch-all" -- backs up everything on the E: drive except the first three folders.  And that my friends is the clincher.  Since it's an "everything but ..." construct, this last WINFS object will automatically pick up any new directories that are added at any level on the E: drive without the need to modify the backup specification!

 

The significance of WINFS object identification bears some explanation at this point.  Let's have a quick look at the three parameters that follow the WINFS designator.

 

WINFS "Description" Client_FQDN:"Filesystem"

 The description field is free-form text that will prove exceptionally useful for split objects when it comes time for a restore.  By default, DP uses the volume name for the description.  Second is the client's fully-qualified domain name (FQDN) as it is known to the Cell Manager.  Last is DP's designation for which volume is being backed-up.  These three pieces of information are what DP uses to build and/or verify a restore chain for each object.  Each object must have a unique combination of Description, Client_FQDN, and Filesystem.

 

In a restore context, you will see a split object multiple times.  In our example, expect to see four "E:" drives available for restore on Fanta.  So how do you know which rabbit hole leads to the file you wish to restore?  Ah, that's where a good description for each object becomes quite valuable!  Have a look at the changes I've made to our object descriptions.

 

DnC_Datalist_After.jpg

 

Each appearance of the E: drive in a restore context will be accompanied by its object description.  So if your mission is to restore a file in Curly's user directory, then the E: drive with the description "E:\Users|Curly" is the one you want to select and expand.

 

You may have already done the math in your head on other effects of changing object descriptions.  One worth mentioning is inadvertently breaking an existing restore chain.  Let's say you run weekly full backups and daily incrementals.  Somewhere in the middle of the week, you decide to tweak a couple of object descriptions.  The next morning, you see that some objects in your nightly incremental were forced from incremental to full -- specifically the two objects for which you modified the description.  When the backup started in incremental mode, DP tried (and failed) to find a previous full that was still protected for the two objects in question.  The Client_FQDN matched, and the Filesystem matched, but the Description did not.  Thus no valid restore chain.  This is the expected behavior, but what's the takeaway?  For new backups, no big deal.  For existing backups, wait to tweak your object descriptions until such time as you can afford the run-time of a full backup versus an incremental.

 

Four caveats deserve attention.  First, make sure that you have your tape device concurrency set sufficiently high to run all of your new objects in parallel with none left pending.  Second, understand that there is a point of diminishing returns if you try to slice a drive up into too many separate chunks.  All of those i/o operations are still acting upon the same LUN and SCSI request queue.  Third, this "divide and conquer" technique will quickly highlight the network as a backup bottleneck if you're not taking advantage of multi-path devices and Data Protector's LAN-Free Backup functionality.  Finally, do not enable VSS functionality in the advanced WinFS options.  The VSS API call works at the volume level which means that you'll have multiple snapshot definitions running against the same volume simultaneously.  That will lead to a train wreck of error messages in your backup.

 

I've provided crushing detail in some areas yet made assumptions about basic knowledge in others.  If you have any questions or observations about the information I've presented here, please leave a comment.  On nearly a weekly basis, my job involves a soup-to-nuts evaluation of poor performing backups at one site or another.  During the remediation that follows discovery, I teach my divide and conquer technique among many other gems in the comfort of your own IT shop.  If this is something that would be of value to you, please contact your local HP account team and ask them to engage Integration and Technical Services on your behalf.  We'd love to help!

Comments
Ken Olivier(anon) | ‎05-11-2011 02:45 PM

I’ve worked with Data Protector for over 15 years and this is the first I heard (or remember) about the ability to clone an object in the selection tree. It sure beats the manual crafting by adding objects in the summary screen I have been doing. Good work!

Joachim Timm(anon) | ‎05-31-2011 05:01 PM

Very good!

Tips & Tricks - I like it !

 

Joachim

Jenni | ‎06-10-2011 07:39 PM

I use this a lot for customers with large file servers where the full backup window is pretty much the whole weekend and regularly reduce it  from 50+ hours to approx 8 hours - but it should be used with caution unless the data sits on virtual disks (e.g. MSA, EVA, XP, P4000) as otherwise I understand it causes "disk-thrashing" which seriously reduces the lifespan of the disks?

 

Tip:- If you want to break up the file data into roughly equal sized objects try using spacemonger.exe on the disk you want to back up - it gives a really clear picture of which folders are large enough to require their own object in the backup spec.

Mr_T | ‎06-15-2011 01:03 PM

Hi Jenni,

 

Glad to hear that you've had success with this technique.  I've experienced similar reductions in backup time.  I do still caution customers though that mileage varies.  As you know, there are a number of variables involved.

 

It is quite true that you can reach a point of diminishing returns even with array-based LUNs as I eluded to in the next to last paragraph.  All of those i/o requests are going to be competing for system and spindle resources.  Essentially, you will get performance gain from breaking a volume into pieces up to the point at which you have exhausted the system's ability to deliver data from that volume.  As to disk lifespan, that relates more to the rated duty cycle of the disk compared to how you are using it.  If it's rated at a 100% duty cycle, then it doesn't matter whether it's an accounting application or a backup that's pounding it 24x7 -- you're within the intended parameters of the device.  However, if it's for example a 1 TB FATA drive, I reckon it's possible you may be risking an abbreviated disk life.  I believe devices of that class are ideal for keeping long-term data out on spinning disk but not necessarily meant for intensive and continuous i/o.

 

Thanks for the tip on spacemonger.exe.  I'll have to check that one out.  I've historically used WinDirStat myself, although that shouldn't be taken as any sort of endorsement from HP.

 

And thanks as always for reading DPTIPS!

 

Respectfully,

Mr_T

bobc(anon) | ‎06-16-2011 02:31 PM

I have been teaching a slightly different technique for years.  When creating a backup, don't pick any objects in the source window.  Instead, add the objects in the Backup Object Summary window (Note: each object must have a unique descriptor, since an object name is "hostname: mountpoint 'descriptor'"

 

It is a little bit longer to do it this way, but, if you want to send different parts to different tape drives, un-check Load Balancing when creating the backup, and assign the drive in Backup Object Summary

 

Otherwise, a nice idea.  I'll have to try it at my next opportunity

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the community guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
About the Author
  • More than 30 years in Sales and Marketing in IT services business. Currently managing global campaigns for Datacenter Care.
  • I graduated in Software Engineering. Joined HP family five years ago, I deliver Insight Remote Support technical consulting for HP customers, in North America, Canada and Latin America. Assist setting up, installing and configuring the solution in customers' IT environments.
  • I am an identical twin. My brother’s name is Greg Tinker and we have been extremely fortunate working similar careers within HP, known to our HP colleagues and many of our customers as "The Tinkers". Our job is to be the technical lead on major business operational outages with millions of Dollars/Euros hanging in the balance. We both have a complete background in architectural, Infrastructure and application environments from both the proactive and reactive side of HP Enterprise Service (HP ES), and HP Enterprise Business (HP EB).
  • I am an identical twin. My brother’s name is Chris Tinker and we have been extremely fortunate working similar careers within HP, known to our HP colleagues and many of our customers as "The Tinkers". Our job is to be the technical lead on major business operational outages with millions of Dollars/Euros hanging in the balance. We both have a complete background in architectural, Infrastructure and application environments from both the proactive and reactive side of HP Enterprise Service (HP ES), and HP Enterprise Business (HP EB). We have always attended the same schools, studied the same material (big surprise, as we are identical twins), and have always worked as a close team and strive to demonstrate our teaming ability’s to others. We each have more than 11 years experience supporting mission-critical enterprise customers on a broad range of technologies. We’ve both won the HP MVP award multiple times as well as coauthored books, programs, and whitepapers in our spare time.
  • More than 25 years in the IT industry, managing ITSM, service development and delivery projects in Technology Services. Specialized in end2end support for ISV based business solutions. Certified ITIL and project management expert.
  • Eduardo Zepeda, WW TS Social Media Program Manager & Internal Communications for WW Technology Services Blogging on behalf of HP Technology Services (TS_Guest)
  • I have been with HP for 13 years, always in Services - first as a Services Channel Sales rep, then a Channel Services Segment Manager, and now, in WW Technology Services Marketing. These may be my formal job titles, but I'm really a Cheerleader for HP Services! I feel that HP has great services, exceptional Technical Experts and Delivery teams, and so many cool things are going on at HP Services. So, stay tuned...
  • I have 27 years of system, storage, and networking experience including detailed work with Data Protector (formerly Omniback II) for the past 14 years. My expertise includes StoreOnce deduplication technology, D2D appliances, performance tuning, complex remediation, and online backup integration with applications like Oracle and infrastructure like VMware. Traveling across the United States and Canada as a Sr. Technical Consultant, I deliver specialized consulting for a broad variety of HP customers.
  • MrCollaboration (aka Jim Evans) is an HP Global Services Alliance Manager. He has worked in the IT industry for more than 30 years, 22 of which were spent with Digital Equipment Corporation, Compaq and HP. He works with many third party vendors and partners to develop processes to facilitate excellent support and service for mutual customers. Jim is also HP’s representative to the Technical Support Alliance Network (TSANet).
  • I've been working in Customer Service for over 20 years. During my career I've provided support services for Languages, Programming Libraries and Operating Systems. During the last 10 years I've provide support for Linux and more recently VMware. My current role is as a Technical Account Manager working in the HP Custom Mission Critical Services Industry Standard Operating Systems team. I provide both reactive and proactive operating system support for proLiant servers and blades. Our services in the Custom teams are built on statement of work contracts for large HP customers who need a customized mission critical support offering.
  • I've been working in HP since 2007 like IT agent, developer, Web designer and then like Web Project Manager
  • I like to listen as much as I like to talk. Why? My 25+ years in the technology industry has taught me that the key to delivering value to customers is to understand what they value in the first place! I developed this passion for customers and consultative selling during my 12 years with Accenture, and I have continued to approach customers in a consultative way during my 12+ year tenure with HP. I also have a passion for HP given my knowledge of our Product and Service Portfolio and the differentiators we possess that position us as a leader in the areas our customers are telling us they want to go. Converged Infrastructure, Converged Cloud, Big Data – and the associated Service and Support implications – all such exciting technology trends where our success will hinge upon our ability to differentiate ourselves versus others in the areas that matter most to our customers. Right up my alley, and I am proud to be part of the great HP team where I know we have the best solutions in the industry!
  • Tom Clement has over 30 years experience in the areas of adult learning, secondary education, and leadership development. During this time Tom has been a consistent champion of “non-traditional” training delivery methods, including blended learning, virtual delivery (self paced and instructor led), the use of training games and simulations, and experiential learning. Tom has spent the past 25 years of his career at Hewlett Packard, focused most recently on HP’s global Virtualization, Cloud, and Converged Infrastructure customer training programs. Tom manages the strategic direction and overall performance of these training programs, ensuring these worldwide programs help HP’s customers capitalize on the business opportunities made available by IT advancements in each of these subject areas. Tom and his global teammates utilize best in class instructors, course content and supporting equipment infrastructure to deliver these training programs to HP’s customers. The team prides itself on providing the Virtualization, Cloud, and Converged Infrastructure content customers need when and where they need it, anywhere in the world. Tom is based in the Washington, DC suburbs and can be reached at tom.clement@hp.com.
Follow Us