By Calvin Zito, @HPStorageGuy
I came across a white paper written by our VMware integration team talking about best practices for the VMware API for Array Integration (VAAI) plug-in for the P9500 and XP24000/XP20000 disk arrays with VMware vSphere 4.1 and it didn't have the normal pub number and I couldn't find it on the web - so I thought adapting it here would make for a good blog post. I'll have a link at the end where you can find info on using VAAI with vSphere 5.0.
VAAI enablement in its current release with vSphere 4.1 consists of two parts:
- A VMWare server-resident software drivers or “plug-ins” supplied by disk array vendors
- A VAAI enabled storage firmware/OS supplied by disk array vendors
These two components allow VMWare vSphere 4.1 hosts to exercise the capabilities defined by the VAAI primitives. Currently, the plug-in offered by HP for use with the P9500 and XP24000/XP20000 disk arrays is capable of the following VAAI primitives:
Block zeroing/Block Initialization (WRITE SAME)
A common operation on virtual disks is to initialize large extends of the disk with zeroes to help isolate VMs and promote security. When performed by vSphere servers, this initialization consumes host CPU cycles, DMA buffers and HBA queue. By off-loading the processing to the disk arrays, the resource consumption in the ESX host can be reduced.
Full copy/Block Cloning (XCOPY)
Copy processing, such as creating a clone of a virtual machine is off-loaded to the disk array. Because the vSphere host does not have to read/write data when copying it, the vSphere host workload is be reduced, as well as the traffic between the vSphere host and the disk array, resulting in improved copy performance.
Hardware assisted locking (Atomic Test and Set - ATS)
The ATS primitive removes the need for SCSI reservation for the purpose of disk-locking. ATS enables exclusive locking at the storage block (sector) level instead of the traditional LUN level locking performed by SCSI-2 reservations. SCSI reserve command contention can be reduced and an improvement of VMFS scalability can be realized.
Best Practices and Recommendations
The installation and verification of the P9500/XP plug-in is covered in the plug-in’s user guide. It is however highly recommended to enable the P9500 or XP24000/XP20000 Host Option Mode 0x54 prior to the installation of HP’s plug-in so that this step isn’t simply overlooked by mistake after the plug-in installation is finished.
Performance Considerations for Block Zeroing
While the nature of the block zeroing primitive functionality as it is implemented on the P9500 and XP24000/XP24000 arrays is nearly cache-independent in nature, it is nevertheless a back-end disk-accessing process that must compete with any other unrelated disk access against any shared portion of allocated storage. It is recommended as a “best practice” to create any new VMDK with this functionality realizing that although the block-zeroing portion this action has been optimized, this may still result in performance attenuations arising from the array block-zeroing process competing with other parallel operations against that same portion of storage. It is further suggested to execute such tasks during periods of time when reduced amounts of I/O is being done to the target data stores.
XCOPY Cloning Operations and Cache Configuration
For the P9500 and XP24000/XP20000 arrays, the amount of array cache available to any cloning operation or collection of operations has a non-trivial effect on cloning performance. A performance benefit may be realized by allocating an amount of cache for cloning up to the size of the VM, or the sum of the sizes of the VM’s being cloned concurrently. However, given that array cache is a valuable and often limited resource in most array configurations, it is suggested as a “best practice” that using Cache Logical Partitions (CLPRs) to allocate quantities of cache that can be dedicated to the processing of VM cloning of larger VM’s (>> 3GB) when these operations are of a high priority or are time-sensitive.
Back‐end Storage Optimizations for XCOPY Cloning
While the XCOPY cloning primitive has the potential to relieve a host of the VM cloning work, it does not exempt the user from design considerations that, if ignored, can negate any benefit the VAAI plug-in might be able to produce. These considerations are mainly in the areas of the host-connect or array “front-end” and the physical HDD-based storage, or array “back-end”. The XCOPY cloning operation will show the strongest relative performance benefit when the hosts connected to the array “front-end” are bandwidth or processing-limited compared to the storage processing resources and HDDs allocated to these hosts from the array “back-end”. For example, a host with a 2Gbit HBA will not be able to complete a VM cloning operation as fast as if it were to use the VAAI XCOPY primitive where the storage allocated to that same host is capable of 400MB/sec or greater. The converse of the above situation is however not true.
The disk access patterns for a VAAI Plug-in cloning (XCOPY) operation are similar to that of an optimized tape backup or restore operation. Because of this, it is particularly necessary to avoid overloading the provisioned storage affected by these cloning operations by paying attention to the density and frequency of their execution. The XP24000 and P9500 Thin Provisioning program product offers the ability to combine several XP or P9500 parity groups into a single storage pool that has a combined performance potential that is greater than any of its member parity groups. The use of Thin Provisioning has the potential of greatly mitigating any possible back-end bottlenecks a user of the VAAI XCOPY Cloning primitive might experience. It is therefore recommended as a “best practice” that Thin Provisioning be used whenever possible.
It is also important to note that a single instance of a VAAI Plug-in is capable of initiating up to 4 cloning operations concurrently. If the user attempts more than 4 on the same plug-in instance, the first four are serviced immediately and the remaining requests are queued for later processing.
You can find the vSphere 4.1 plug-in for the XP24000/XP20000 and P9000 at this link. Also you'll find the installation guide and user guide included in the download.
If you need information about the vSphere 5.0 plug-in, here's a link to that.