- Channel HP
- :
- Enterprise Business Blogs
- :
- Services
- :
- Technical Support Services Blog | HP Technology Services
- :
- DPTIPS: Large Objects - Divide and Conquer with Da...
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content
DPTIPS: Large Objects - Divide and Conquer with Data Protector
By default, Data Protector assigns one Disk Agent (VBDA) to each object. An object is considered to be a Windows volume (drive letter) or Unix mountpoint (filesystem). But what if your object is really large? Isn't there some way to get Data Protector to assign multiple agents with instructions to "divide and conquer" such a large object? With a little cleverness from you, the backup administrator, yes there is.
We'll work with a fairly contrived example, but the technique is sound. Consider the following directory tree as seen in the process of creating a new Data Protector backup specification.
For our example, let's say we've determined the following:
- The E: drive is about 800 GB in total.
- Three-fourths of that space is in E:\Users.
- There is a fairly even distribution between Curly, Larry, and Moe.
- Shemp has only a small bit of data by comparison.
Accordingly, we would probably want four Disk Agents working on unique "chunks" of the E: drive as if they were separate objects. But how? Easy. We just make those chunks separate objects! The key is using a lesser-known right-click option when configuring your backup source.
First, the plan. Based upon our knowledge, we want the following separate chunks as objects.
- Just E:\Users\Curly
- Just E:\Users\Larry
- Just E:\Users\Moe
- All of E: except for Curly, Moe, and Larry in E:\Users
Seeing the initial stages of this technique should raise a concern regarding the addition of new directories on the client after the backup is created and in use. I promise to address that contingency further along, so just stick with me for now.
With the existing appearance (hint hint!) of the client "fanta", we're going to select our first "chunk".
Note that the checkmark for "Users" is gray indicating a partial selection beneath. This will be important later. The task at hand now is to get a second appearance of the client so we can select our next chunk. Here's the less-than-obvious part. I'm almost embarrassed to admit that even I didn't find it for years. Little did I realize it was just a right-click away.
Do that three times, and guess what you have? Four appearances of the very same client!
I think you may see where we're going from here, but the plot has a twist at the end. So don't rush off just yet! We've accounted for Curly's directory. Next, we want to use the second appearance of fanta to select Larry's folder.
Third on our list is Moe's directory. Accordingly, we'll use the third client appearance for our last "include" selection (another clue!).
So far, so good. We've accounted for three of our four chunks. But now we face the critical challenge of selecting everything else yet not creating a static condition where future directory additions to E:\ are skipped unless you modify the backup specification. Here's where the order of operation is critical.
To create the desired syntax in the datalist, first select the whole E: drive in the last appearance of your client, then carefully unselect the folders Curly, Larry, and Moe that were included in the three previous appearances of the client.
This sets up the syntax that is essential for acting as a "catch-all" to grab what you want that's visible right now plus anything new that might be added in the future. Note that the check mark for the E: drive is now turquoise indicating a full select with some excludes versus our previous gray check marks which signify only partial, specific selections. Our backup specification (aka: datalist) is saved as Fanta_Divide_Conquer. Having a look now at the WINFS objects therein, I think the point I'm trying to make will be eminently clear.
The first three objects back up only the specified directory tree for each. The last object -- our "catch-all" -- backs up everything on the E: drive except the first three folders. And that my friends is the clincher. Since it's an "everything but ..." construct, this last WINFS object will automatically pick up any new directories that are added at any level on the E: drive without the need to modify the backup specification!
The significance of WINFS object identification bears some explanation at this point. Let's have a quick look at the three parameters that follow the WINFS designator.
WINFS "Description" Client_FQDN:"Filesystem"
The description field is free-form text that will prove exceptionally useful for split objects when it comes time for a restore. By default, DP uses the volume name for the description. Second is the client's fully-qualified domain name (FQDN) as it is known to the Cell Manager. Last is DP's designation for which volume is being backed-up. These three pieces of information are what DP uses to build and/or verify a restore chain for each object. Each object must have a unique combination of Description, Client_FQDN, and Filesystem.
In a restore context, you will see a split object multiple times. In our example, expect to see four "E:" drives available for restore on Fanta. So how do you know which rabbit hole leads to the file you wish to restore? Ah, that's where a good description for each object becomes quite valuable! Have a look at the changes I've made to our object descriptions.
Each appearance of the E: drive in a restore context will be accompanied by its object description. So if your mission is to restore a file in Curly's user directory, then the E: drive with the description "E:\Users|Curly" is the one you want to select and expand.
You may have already done the math in your head on other effects of changing object descriptions. One worth mentioning is inadvertently breaking an existing restore chain. Let's say you run weekly full backups and daily incrementals. Somewhere in the middle of the week, you decide to tweak a couple of object descriptions. The next morning, you see that some objects in your nightly incremental were forced from incremental to full -- specifically the two objects for which you modified the description. When the backup started in incremental mode, DP tried (and failed) to find a previous full that was still protected for the two objects in question. The Client_FQDN matched, and the Filesystem matched, but the Description did not. Thus no valid restore chain. This is the expected behavior, but what's the takeaway? For new backups, no big deal. For existing backups, wait to tweak your object descriptions until such time as you can afford the run-time of a full backup versus an incremental.
Four caveats deserve attention. First, make sure that you have your tape device concurrency set sufficiently high to run all of your new objects in parallel with none left pending. Second, understand that there is a point of diminishing returns if you try to slice a drive up into too many separate chunks. All of those i/o operations are still acting upon the same LUN and SCSI request queue. Third, this "divide and conquer" technique will quickly highlight the network as a backup bottleneck if you're not taking advantage of multi-path devices and Data Protector's LAN-Free Backup functionality. Finally, do not enable VSS functionality in the advanced WinFS options. The VSS API call works at the volume level which means that you'll have multiple snapshot definitions running against the same volume simultaneously. That will lead to a train wreck of error messages in your backup.
I've provided crushing detail in some areas yet made assumptions about basic knowledge in others. If you have any questions or observations about the information I've presented here, please leave a comment. On nearly a weekly basis, my job involves a soup-to-nuts evaluation of poor performing backups at one site or another. During the remediation that follows discovery, I teach my divide and conquer technique among many other gems in the comfort of your own IT shop. If this is something that would be of value to you, please contact your local HP account team and ask them to engage Integration and Technical Services on your behalf. We'd love to help!
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
I’ve worked with Data Protector for over 15 years and this is the first I heard (or remember) about the ability to clone an object in the selection tree. It sure beats the manual crafting by adding objects in the summary screen I have been doing. Good work!
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
Very good!
Tips & Tricks - I like it !
Joachim
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
I use this a lot for customers with large file servers where the full backup window is pretty much the whole weekend and regularly reduce it from 50+ hours to approx 8 hours - but it should be used with caution unless the data sits on virtual disks (e.g. MSA, EVA, XP, P4000) as otherwise I understand it causes "disk-thrashing" which seriously reduces the lifespan of the disks?
Tip:- If you want to break up the file data into roughly equal sized objects try using spacemonger.exe on the disk you want to back up - it gives a really clear picture of which folders are large enough to require their own object in the backup spec.
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
Hi Jenni,
Glad to hear that you've had success with this technique. I've experienced similar reductions in backup time. I do still caution customers though that mileage varies. As you know, there are a number of variables involved.
It is quite true that you can reach a point of diminishing returns even with array-based LUNs as I eluded to in the next to last paragraph. All of those i/o requests are going to be competing for system and spindle resources. Essentially, you will get performance gain from breaking a volume into pieces up to the point at which you have exhausted the system's ability to deliver data from that volume. As to disk lifespan, that relates more to the rated duty cycle of the disk compared to how you are using it. If it's rated at a 100% duty cycle, then it doesn't matter whether it's an accounting application or a backup that's pounding it 24x7 -- you're within the intended parameters of the device. However, if it's for example a 1 TB FATA drive, I reckon it's possible you may be risking an abbreviated disk life. I believe devices of that class are ideal for keeping long-term data out on spinning disk but not necessarily meant for intensive and continuous i/o.
Thanks for the tip on spacemonger.exe. I'll have to check that one out. I've historically used WinDirStat myself, although that shouldn't be taken as any sort of endorsement from HP.
And thanks as always for reading DPTIPS!
Respectfully,
Mr_T
- Mark as Read
- Mark as New
- Bookmark
- Highlight
- Email to a Friend
- Report Inappropriate Content
I have been teaching a slightly different technique for years. When creating a backup, don't pick any objects in the source window. Instead, add the objects in the Backup Object Summary window (Note: each object must have a unique descriptor, since an object name is "hostname: mountpoint 'descriptor'"
It is a little bit longer to do it this way, but, if you want to send different parts to different tape drives, un-check Load Balancing when creating the backup, and assign the drive in Backup Object Summary
Otherwise, a nice idea. I'll have to try it at my next opportunity





