By default, Data Protector assigns one Disk Agent (VBDA) to each object. An object is considered to be a Windows volume (drive letter) or Unix mountpoint (filesystem). But what if your object is really large? Isn't there some way to get Data Protector to assign multiple agents with instructions to "divide and conquer" such a large object? With a little cleverness from you, the backup administrator, yes there is.
We'll work with a fairly contrived example, but the technique is sound. Consider the following directory tree as seen in the process of creating a new Data Protector backup specification.
For our example, let's say we've determined the following:
- The E: drive is about 800 GB in total.
- Three-fourths of that space is in E:\Users.
- There is a fairly even distribution between Curly, Larry, and Moe.
- Shemp has only a small bit of data by comparison.
Accordingly, we would probably want four Disk Agents working on unique "chunks" of the E: drive as if they were separate objects. But how? Easy. We just make those chunks separate objects! The key is using a lesser-known right-click option when configuring your backup source.
First, the plan. Based upon our knowledge, we want the following separate chunks as objects.
- Just E:\Users\Curly
- Just E:\Users\Larry
- Just E:\Users\Moe
- All of E: except for Curly, Moe, and Larry in E:\Users
Seeing the initial stages of this technique should raise a concern regarding the addition of new directories on the client after the backup is created and in use. I promise to address that contingency further along, so just stick with me for now.
With the existing appearance (hint hint!) of the client "fanta", we're going to select our first "chunk".
Note that the checkmark for "Users" is gray indicating a partial selection beneath. This will be important later. The task at hand now is to get a second appearance of the client so we can select our next chunk. Here's the less-than-obvious part. I'm almost embarrassed to admit that even I didn't find it for years. Little did I realize it was just a right-click away.
Do that three times, and guess what you have? Four appearances of the very same client!
I think you may see where we're going from here, but the plot has a twist at the end. So don't rush off just yet! We've accounted for Curly's directory. Next, we want to use the second appearance of fanta to select Larry's folder.
Third on our list is Moe's directory. Accordingly, we'll use the third client appearance for our last "include" selection (another clue!).
So far, so good. We've accounted for three of our four chunks. But now we face the critical challenge of selecting everything else yet not creating a static condition where future directory additions to E:\ are skipped unless you modify the backup specification. Here's where the order of operation is critical.
To create the desired syntax in the datalist, first select the whole E: drive in the last appearance of your client, then carefully unselect the folders Curly, Larry, and Moe that were included in the three previous appearances of the client.
This sets up the syntax that is essential for acting as a "catch-all" to grab what you want that's visible right now plus anything new that might be added in the future. Note that the check mark for the E: drive is now turquoise indicating a full select with some excludes versus our previous gray check marks which signify only partial, specific selections. Our backup specification (aka: datalist) is saved as Fanta_Divide_Conquer. Having a look now at the WINFS objects therein, I think the point I'm trying to make will be eminently clear.
The first three objects back up only the specified directory tree for each. The last object -- our "catch-all" -- backs up everything on the E: drive except the first three folders. And that my friends is the clincher. Since it's an "everything but ..." construct, this last WINFS object will automatically pick up any new directories that are added at any level on the E: drive without the need to modify the backup specification!
The significance of WINFS object identification bears some explanation at this point. Let's have a quick look at the three parameters that follow the WINFS designator.
WINFS "Description" Client_FQDN:"Filesystem"
The description field is free-form text that will prove exceptionally useful for split objects when it comes time for a restore. By default, DP uses the volume name for the description. Second is the client's fully-qualified domain name (FQDN) as it is known to the Cell Manager. Last is DP's designation for which volume is being backed-up. These three pieces of information are what DP uses to build and/or verify a restore chain for each object. Each object must have a unique combination of Description, Client_FQDN, and Filesystem.
In a restore context, you will see a split object multiple times. In our example, expect to see four "E:" drives available for restore on Fanta. So how do you know which rabbit hole leads to the file you wish to restore? Ah, that's where a good description for each object becomes quite valuable! Have a look at the changes I've made to our object descriptions.
Each appearance of the E: drive in a restore context will be accompanied by its object description. So if your mission is to restore a file in Curly's user directory, then the E: drive with the description "E:\Users|Curly" is the one you want to select and expand.
You may have already done the math in your head on other effects of changing object descriptions. One worth mentioning is inadvertently breaking an existing restore chain. Let's say you run weekly full backups and daily incrementals. Somewhere in the middle of the week, you decide to tweak a couple of object descriptions. The next morning, you see that some objects in your nightly incremental were forced from incremental to full -- specifically the two objects for which you modified the description. When the backup started in incremental mode, DP tried (and failed) to find a previous full that was still protected for the two objects in question. The Client_FQDN matched, and the Filesystem matched, but the Description did not. Thus no valid restore chain. This is the expected behavior, but what's the takeaway? For new backups, no big deal. For existing backups, wait to tweak your object descriptions until such time as you can afford the run-time of a full backup versus an incremental.
Four caveats deserve attention. First, make sure that you have your tape device concurrency set sufficiently high to run all of your new objects in parallel with none left pending. Second, understand that there is a point of diminishing returns if you try to slice a drive up into too many separate chunks. All of those i/o operations are still acting upon the same LUN and SCSI request queue. Third, this "divide and conquer" technique will quickly highlight the network as a backup bottleneck if you're not taking advantage of multi-path devices and Data Protector's LAN-Free Backup functionality. Finally, do not enable VSS functionality in the advanced WinFS options. The VSS API call works at the volume level which means that you'll have multiple snapshot definitions running against the same volume simultaneously. That will lead to a train wreck of error messages in your backup.
I've provided crushing detail in some areas yet made assumptions about basic knowledge in others. If you have any questions or observations about the information I've presented here, please leave a comment. On nearly a weekly basis, my job involves a soup-to-nuts evaluation of poor performing backups at one site or another. During the remediation that follows discovery, I teach my divide and conquer technique among many other gems in the comfort of your own IT shop. If this is something that would be of value to you, please contact your local HP account team and ask them to engage Integration and Technical Services on your behalf. We'd love to help!