As you might imagine, I encounter a broad variety of customer environments each year: Small IT shops and sprawling enterprises, highly-optimized configurations and those with many opportunities for improvement, complete server virtualization and no virtualization at all, admins who are just starting out with Data Protector and seasoned veterans over a decade in who refuse to call it anything but Omniback. You want to know the crazy thing? Even after 13 years with good ol' OB2, I still seem to learn something new almost every week. I try to encapsulate these "nuggets of wisdom" and share them so that we may all benefit where possible.
In this article, I'd like to concentrate on not only on the generous performance of Data Protector with HP's D2D appliance but also on the potential impediments that can diminish the overall transfer rate during a backup.
The D2D in in the POC I examined is one of the big boys -- a fully trimmed-out D2D4324 with 72 TB of usable physical space. This monster is rated at a maximum sustained ingest rate of 1,100 MB/s or just north of 3.7 TB/hr with i/o balanced across two 8 Gb/s FC ports. Since we're working with 4 Gb SAN fabrics, that lowered our anticipated maximum down to 800 MB/s. (And yes, there are still 8 bits to a byte. I'll spare you the arithmetic, but once you account for protocol overhead, figure a 10:1 ratio with Fiber Channel Mass Storage. So 4 Gb/s FC is good for roughly 400 MB/s.) Now for the real challenge: How to get that much data to the doorstep of the D2D. It's not as easy as you might think.
The diagram below is a highly simplified summary of the environment. (Hey, you can only do so much with 640 pixels of horizontal real estate in this column!) The first backup utilizes 12 drives in its own VTL and reads data directly from Business Copy volumes that have been split from a production database.
Understand that this is the best possible scenario because no backup traffic traverses the LAN. Backup data flow is indicated by the orange line. Not surprisingly, we observed around 300 MB/s with this one backup alone. Still, that put us nowhere near the 800 MB/s mark that should be available. Here's where the challenges start to stack up.
Most of the remaining backup data comes from clients that are either not SAN-attached or from a multitude of virtual machines (VMs) that live in two different hypervisors. VADP integration not withstanding, that means that we are limited to network-based backups as indicated below.
Even with strict adherence to the multi-streaming recommendations laid out in our Best practices for VTL, NAS and Replication implementations, each backup added concurrent to the first only gave us an incremental bump in the aggregate ingest rate.
The graph above gives a good visual of where we ended up even with 36 streams going to 5 separate VTLs. Looks like we flirted with 600 MB/s rather than bumping up against our theoretic maximum of 800 MB/s. Why?
Here are the potential culprits in no particular order:
In all likelihood, it is an algebraic sum of several points.
Although virtualization continues to improve with each product iteration, there is still a price to be paid for virtualizing disk and network i/o operations. Though not possible for this particular POC, my first choice would be to utilize Data Protector's integration with VMware APIs for Data Protection (VADP). A proxy-based solution like VADP offloads VM backup traffic from the ESXi servers and and eliminates the associated performance penalty on both the backup and interactive VM users. It also makes LAN-free VM backups possible since the proxy can be fabric attached.
Many of the non-VM clients were also constrained to network backups since they utilize local storage and are not fabric attached. Why do I demonize the LAN when it comes to backups? It's not Layer 2 but rather a matter of protocol. You could have infinite wire speed and still hit a brick wall because you're jamming tons of data through two TCP/IP stacks -- one leaving the client as data is put on the wire and one entering the media agent host as that data is siphoned from the ether. Never neglect the fact that more moving parts require more system resources and almost always impact overall performance.
My gut feeling is that the cost of virtualization and the heavy reliance on LAN backups played the largest role in the delta between observed performance and our expected maximum. Still, the remaining points merit at least a brief discussion.
I have addressed the unique challenge of efficiently backing up large filesystems in a previous article. That actually holds some promise for a few of the clients up to the point at which the LAN or virtualization takes over as the limiting factor.
The Cell Manager is a well-equipped vPar in a Superdome which although busy was in no measurable distress at the height of backup activity. We did not look at fabric utilization, but again, my instinct tells me that other factors were jamming our inability to go full throttle. Finally, the drives in each VTL are evenly distributed across the two D2D FC ports, so no concern exists there.
If nothing else, I hope sharing this experience stresses the importance of maintaining a broad perspective when identifying potential sources of less-than-stellar backup performance. Yes, 600 MB/s is nothing to sneeze at, but we should have been able to approach 800 MB/s in this environment. Pouring on more LAN-free, split-mirror backups would have been a sure bet, but that wasn't possible.
As far as value proposition with the D2D, I've completely neglected the advantage of HP's StoreOnce deduplication technology, low-bandwidth replication, or the speed and convenience of a singe-file restore from the D2D as compared to physical tape. Subjects for another day, my friends. Stay tuned!
We encourage you to share your comments on this post. Comments are moderated and will be reviewed and posted as promptly as possible during regular business hours.
To ensure your comment is published, please follow our community guidelines.