Gathering Changed Block Information

Associated with changed block tracking is changeId, an identifier for versions of changed block data. Whenever a virtual machine snapshot is created, associated with that snapshot is a changeId that functions as a landmark to identify changes in virtual disk data. So it follows that when a snapshot is created for the purpose of creating an initial virtual disk backup, the changeId associated with that snapshot can be used to retrieve changes that have occurred since snapshot creation.

To obtain the changeId associated with any disk in a snapshot, you examine the “hardware” array from the snapshot. Any item in the devices table that is of type vim.vm.device.VirtualDevice.VirtualDisk encloses a class describing the “backing storage” (obtained using getBacking) that implements virtual disk. If backing storage is one of the following disk types, you can use the changeId property of the BackingInfo data object to obtain the changeId:

vim.vm.device.VirtualDevice.VirtualDiskFlatVer2BackingInfo
vim.vm.device.VirtualDevice.VirtualDiskSparseVer2BackingInfo
vim.vm.device.VirtualDevice.VirtualDiskRawDiskMappingVer1BackingInfo
vim.vm.device.VirtualDevice.VirtualDiskRawDiskVer2BackingInfo

Information returned by the QueryChangedDiskAreas method is a DiskChangeInfo data object containing an array of DiskChangeInfo.DiskChangeExtent items that enumerate the start offset and length of various disk areas that changed, and the length and start offset of the entire disk area covered by DiskChangeInfo.

When using QueryChangedDiskAreas to gather information about snapshots, enable change tracking before taking a snapshot. Attempts to collect information about changes that occurred before change tracking was enabled result in a FileFault error. Enabling change tracking provides the additional benefit of saving space because it enables backup of only information that has changed. If change tracking is not enabled, the entire virtual machine must be backed up each time, rather than incrementally.

Changed block tracking is supported whenever the I/O operations are processed by the ESXi storage stack:

  • For a virtual disk stored on VMFS, no matter what backs the VMFS volume (SAN or local disk).
  • For a virtual disk stored on NFS (though thick vs thin might be an issue).
  • For an RDM in virtual compatibility mode.

When I/O operations are not processed by the ESXi storage stack, changed block tracking is not usable:

  • For an RDM in physical compatibility mode.
  • A disk that is accessed directly from inside a VM. For example if you are running an iSCSI initiator within the virtual machine to access an iSCSI LUN from inside the VM, vSphere cannot track it.

If the guest actually wrote to each block of a virtual disk (long format or secure erase), or if the virtual disk is thick and eager zeroed, or cloned thick disk, then the query may report the entire disk as being in use.

To find change information, you can use the managed object browser at http://<ESXhost>/mob to follow path content > rootFolder > datacenter > datastore > vm > snapshot > config > hardware > virtualDisk > backing. Changed block tracking information (changeId) appears in the BackingInfo.

The following C++ code sample assumes that, in the past, you obtained a complete copy of the virtual disk, and at the time when the changeId associated with the snapshot was collected, you stored it for use at a later time, which is now. A new snapshot has been created, and the appropriate moRef is available:

String changeId;      // Already initialized: changeId, snapshotMoRef, theVM
ManagedObjectReference snapshotMoRef;
ManagedObjectReference theVM;
int diskDeviceKey;    // Identifies the virtual disk.
VirtualMachine.DiskChangeInfo changes;
long startPosition = 0;
do {
   changes = theVM.queryChangedDiskAreas(snapshotMoRef, diskDeviceKey, startPosition, changeId);
   for (int i = 0; i < changes.changedArea.length; i++) {
      long length = changes.changedArea[i].length;
      long offset = changes.changedArea[i].startOffset;
      //
      // Go get and save disk data here
   }
   startPosition = changes.startOffset + changes.length;
} while (startPosition < diskCapacity);

In the above code, QueryChangedDiskAreas is called repeatedly, as position moves through the virtual disk. This is because the number of entries in the ChangedDiskArea array could occupy a large amount of memory for describing changes to a large virtual disk. Some disk areas may have no changes for a given changeId.

The changeId (changed block ID) contains a sequence number in the form <UUID>/<nnn>. If <UUID> changes, it indicates that tracking information has become invalid, necessitating a full backup. Otherwise incremental backups can continue in the usual pattern.