Tips and Best Practices
VDDK 5.0 contained two new VixDiskLib calls (PrepareForAccess and EndAccess) to disable and enable Storage vMotion during backup. This prevents stale disk images from being left behind if a virtual machine has its storage moved while a backup is taking place. VMware strongly recommends use of these calls.
When an ESX/ESXi host is managed by vCenter Server, vSphere API calls cannot contact the host directly: they must go through vCenter. If necessary, especially during disaster recovery, the administrator must disassociate the ESXi host from vCenter Server before the host can be contacted directly.
Advanced transports allow programs to transfer data in the most efficient manner. SAN transport is available only when the physical-machine host has SAN access. HotAdd works for the appliance model, where backup is done from inside virtual machines. HotAdd requires the virtual machine datastore to be accessible from the backup appliance. NBDSSL is a secure fallback when over-the-network backup is your only choice.
Best Practices for SAN Transport
For array-based storage, SAN transport is often the best performing choice for backups when running on a physical proxy. It is disabled inside virtual machines, so use SCSI HotAdd instead on a virtual proxy.
SAN transport is not always the best choice for restores. It offers the best performance on thick disks, but the worst performance on thin disks, because of round trips through the disk manager APIs, AllocateBlock and ClearLazyZero. For thin disk restore, NBDSSL is usually faster, and NBD is even faster. Changed Block Tracking (CBT) must be disabled for SAN restores. Also, SAN transport does not support writing to redo logs (snapshots or child disks), only to base disks.
When writing to SAN during restore, disk size should be a multiple of the underlying VMFS block size, otherwise write to the last fraction of a disk will fail. For example, if virtual disk has a 1MB block size and the datastore is 16.3MB large, the last 0.3MB will not get written. Add 0.7MB of zeroes to complete the block.
Programs that open a local virtual disk in SAN mode might be able to read (if the disk is empty) but writing will throw an error. Even if programs call VixDiskLib_ConnextEx() with NULL parameter to accept the default transport mode, SAN is selected as the preferred mode if SAN storage is connected to the ESXi host. VixDiskLib should, but does not, check SAN accessibility on open. With local disk, programs must explicitly request NBD or NBDSSL mode.
For a Windows Server 2008 proxy, set SAN policy to onlineAll. Set SAN disk to read-only except for restore. You can use the diskpart utility to clear the read-only flag. SAN policy varies by Windows Server 2008 edition. For Enterprise and Datacenter editions, the default Windows SAN policy is offline, which is unnecessary when vSphere mediates SAN storage.
Best Practices for HotAdd Transport
Deploy the proxy on VMFS-5 volumes, or on VMFS-3 volumes capable of large block size (see About the HotAdd Proxy) so that the proxy can back up very large virtual disks.
A redo log is created for HotAdded disks, on the same datastore as the base disks. Do not remove the target virtual machine (the one being backed up) while HotAdded disk is still attached. If removed, HotAdd fails to properly clean up redo logs so virtual disks must be removed manually from the backup appliance. Also, do not remove the snapshot until after cleanup. Removing it could result in an unconsolidated redo log.
HotAdd is a SCSI feature and does not work for IDE disks. The paravirtual SCSI controller (PVSCSI) is not supported for HotAdd; use the LSI controller instead.
Removing all disks on a controller with the vSphere Client also removes the controller. You might want to include some checks in your code to detect this in your appliance, and reconfigure to add controllers back in.
Virtual disk created on Windows by HotAdd backup or restore might have a different disk signature than the original virtual disk. The workaround is to reread or rewrite the first disk sector in NBD mode.
HotAdded disks should be released with VixDiskLib_Cleanup() before snapshot delete. Cleanup might cause improper removal of the change tracking (ctk) file. You can fix it by power cycling the virtual machine.
Customers running a Windows Server 2008 proxy on SAN storage should set SAN policy to onlineAll (see note about SAN policy in Best Practices for SAN Transport).
Best Practices for NBDSSL Transport
Various versions of ESX/ESXi have different defaults for NBD timeouts. Some have no timeouts. VMware recommends that you specify a default NBD timeout in the VixDiskLib configuration file. If you do not specify a timeout, some versions of ESX/ESXi will hold the corresponding disk open indefinitely, until vpxa or hostd is restarted. However, if you set a timeout, you might have to perform some “keepalive” operations to prevent the disk from being closed on the server side. Reading block 0 periodically is a good keepalive operation.
Before ESXi 5.0 there were no default network file copy (NFC) timeouts. Default NFC timeout values may change in future releases. VMware recommends that you specify default NFC timeouts in the VixDiskLib configuration file. If you do not specify a timeout, older versions of ESX/ESXi hold the corresponding disk open indefinitely, until vpxa or hostd is restarted. However with a timeout, you might need to perform some “keepalive” operation to prevent the disk from being closed on the server side. Reading block 0 periodically is a good keepalive operation.
As a starting point, recommended settings are 3 minutes for Accept and Request, 1 minute for Read, 10 minutes for Write, and no timeouts (0) for nfcFssrvr and nfcFssrvrWrite.
General Backup and Restore
For incremental backup of virtual disk, always enable changed block tracking (CBT) before the first snapshot. When doing full restores of virtual disk, disable CBT for the duration of the restore. File-based restores affect change tracking, but disabling CBT is optional for partial restores, except with SAN transport. CBT should be disabled for SAN transport writes because the file system must be able to account for thin-disk allocation and clear-lazy-zero operations.
Backup software should ignore independent disks (those not capable of snapshots). These virtual disks are unsuitable for backup. They throw an error if a snapshot is attempted on them.
To back up thick disk, the proxy's datastore must have at least as much free space as the maximum configured disk size for the backed-up virtual machine. Thin-provisioned disk is often faster to back up.
If you do a full backup of lazy-zeroed thick disk with CBT disabled, the software reads all sectors, converting data in empty (lazy-zero) sectors to actual zeros. Upon restore, this full backup data will produce eager-zeroed thick disk. This is one reason why VMware recommends enabling CBT before the first snapshot.
Backup and Restore of Thin-Provisioned Disk
Thin-provisioned virtual disk is created on first write. So the first-time write to thin-provisioned disk involves extra overhead compared to thick disk, whether using NBD, NBDSSL, or HotAdd. This is due to block allocation overhead, not VDDK advanced transports. However once thin disk has been created, performance is similar to thick disk, as discussed in the Performance Study of VMware vStorage Thin Provisioning.
When applications perform random I/O or write to previously unallocated areas of thin-provisioned disk, subsequent backups can be larger than expected, even with CBT enabled. In some cases, disk defragmentation might help reduce the size of backups.
Virtual Machine Configuration
Do not make verbatim copies of configuration files, which can change. For example, entries in the .vmx file point to the snapshot, not the base disk. The .vmx file contains virtual-machine specific information about current disks, and attempting to restore this information could fail. Instead use PropertyCollector and keep a record of the ConfigInfo structure.
About Changed Block Tracking
QueryChangedDiskAreas("*") returns information about areas of a virtual disk that are in use (allocated). The current implementation depends on VMFS properties, similar to propertues that SAN transport mode uses to locate data on a SCSI LUN. Both rely on unallocated areas (file holes) in virtual disk, and the LazyZero designation for VMFS blocks. Thus, changed block tracking yields meaningful results only on VMFS. On other storage types, it either fails, or returns a single extent covering the entire disk.
You should enable changed block tracking in the order recommended by Enabling Changed Block Tracking. The first time you call QueryChangedDiskAreas("*"), it should return allocated areas of virtual disk. Subsequent calls return changed areas, instead of allocated areas. If you call QueryChangedDiskAreas after a snapshot but before you enable changed block tracking, it also returns unallocated areas of virtual disk. With thin-provisioned virtual disk this could be a large amount of zero data.
The guest operating system has no visibility of changed block tracking. Once a virtual machine has written to a block on virtual disk, the block is considered in use. The information required for the "*" query is computed when changed block tracking is enabled, and the .ctk file is pre-filled with allocated blocks. The mechanism cannot report changes made to virtual disk before changed block tracking was enabled.