Chapter 7. RHSA-2013:0886 — VDSM

The bugs contained in this chapter are addressed by advisory RHSA-2013:0886. Further information about this advisory is available at https://rhn.redhat.com/errata/RHSA-2013-0886.html.
BZ#878064
Previously, when attempting to live migrate a virtual machine from one host to another host, a TimeoutError appeared, which read "TimeoutError: Timed out during operation: cannot acquire state change lock in vdsm for setVmTicket has an exception error in UI." VDSM did not attempt to contact QEMU for storage sampling. 

VDSM has now been updated to contact QEMU for storage sampling.

Now, when TimeoutErrors appear during live migration, they are more specific.
BZ#841555
Previously, a race condition arose when the after_vm_cont hook was called before _dom was created and assigned. This caused migrations to fail with exceptions.

A workaround has been implemented that ignores the after_vm_cont hook if it is called before _dom was created and assigned.

Migrations now continue as expected.
BZ#955140
Previous versions of libvirt (0.10.2-18.el6_4.2 and 0.10.2-18.el6_4.3) had stability issues due to locking model changes. This meant that simultaneous operations, including the mass migration, sometimes caused libvirt failures.

The required version of libvirt was changed to 0.10.2-18.el6_4.4.

libvirt is now stable and does not provoke these race conditions. Using this version of libvirt, VDSM is able to perform simultaneous operations on many virtual machines.
BZ#947014
Previously, VDSM was unable to decode an application list if the application name contained non-ASCII characters.

VDSM can now decode application lists when application names contain non-ASCII characters.
BZ#928217
Previously, VDSM used FileHandler to manage the logs. FileHandler was replaced with WatchedFileHandler but /var/log/vdsm/libvirt.log was not included in WatchedFileHandler.

This caused /var/log/vdsm/libvirt.log to expand endlessly, which led to cases in which the RHEV-H file system could fill with the libvirt log.

libvirt.log has been removed, eliminating any chance that the RHEV-H filesystem could be filled with it.
BZ#925981
Previously, libvirt changed the default behaviour from the limited migration bandwidth to unlimited one. This meant that without any migration bandwidth limitation, the network could be saturated during mass migrations.

The default limit was set to "on". This solution is compatible with all versions, and preserves the original behaviour. This fix ensures that customers with large networks will notice no change in migration behavior.
BZ#873145
Previously, during live storage migration, some virtual machines changed their state, becoming paused. 

VDSM was patched so that it extends volumes doubling the chunk's size during live migration. This doubles the size of the watermark limit as well, and gives VDSM more time to accomplish the storage operations necessary for live migration.

Virtual machines no longer pause during live storage migration.
BZ#920671
After a Red Hat Enterprise Virtualization Hypervisor is attached to the Red Hat Enterprise Virtualization Manager and then successfully upgraded, it may erroneously appear in the administration portal with the status of Install Failed. Click on the Activate button, and the hypervisor will change to an Up status and be ready for use.
BZ#834041
Previously, VDSM lost its connection to the libvirt socket in certain cases.

VDSM no longer loses its connection to the libvirt socket.
BZ#962549
After upgrading to 3.1, a snapshot of a virtual machine from the older environment can be successfully removed, but the virtual machine would fail to start. This was due to a failure to tear down the snapshot's volume path on the host storage manager prior to merging the snapshot, which left the volume activated on both the storage pool manager and the host storage manager. This update removes unnecessary volume paths and deactivates the snapshot volumes after they are deleted, so virtual machines can run successfully under these conditions.
BZ#912158
A change caused a regression for customers with Local Storage Datacenters in cases where the storage domain points to a path not on the root storage device. This caused the storage of working systems to break.

Local storage domains in locations other than root (/) now work properly.
BZ#879253
Previously, when creating the first storage domain in a setup that included 2 hosts, the hosts failed to create the initial pool and returned an error during connectStoragePool which read "Wrong Master Domain or Version" due to stale cache in VDSM.

A patch to VDSM corrects this error, and creating the first storage domain in a setup that includes two hosts now successfully creates the initial pool.
BZ#880961
Previously, the VDSM daemon was unresponsive after upgrading from vdsm-4.9-113.4.el6_3 to vdsm-4.9.6-44.0.el6_3.

This was due to a race condition that existed when VDSM started up and when it restarted itself.

A patch ensures that the VDSM daemon is responsive after upgrading.
BZ#911417
Previously, when you upgraded to Red Hat Enterprise Linux 6.3, NFS permissions were set to 440, which made it impossible for the qemu user to start Red Hat Enterprise Virtualization 2.2 virtual machines.

A patch to VDSM ensures that the NFS permissions are set correctly (that is, set to 660), and that the qemu user is able to start Red Hat Enterprise Virtualization 2.2 virtual machines.
BZ#881947
Previously, editing storage domains in a fibre channel data center environment caused getDeviceList to fail with an exception.

A patch ensures that editing storage domains in a fibre channel data center environment does not cause getDeviceList to fail with an exception.
BZ#883327
Storage domain metadata is upgraded when Red Hat Enterprise Virtualization is upgraded from version 3.0 to version 3.1. 

Previously, storage lease metadata files were assigned incorrect permissions. This prevented file-based storage domains from being recovered after upgrades.

This fix modifies the permissions on the storage lease metadata files during the upgrade procedure. The local domain can now be upgraded without issue.
BZ#882276
During an early beta of rhev 3.0, vdsm generated problematic metadata tags (see BZ#732980) that are incompatible with the V3 upgrade. A preliminary step has been added to the upgrade process in order to fix the relevant tags (when needed) and proceed with the regular upgrade.
BZ#808998
To create a Fibre Channel storage domain on a CCISS device, a scsi_disk path was used to retrieve Host, Bus, Target, Lun (HBTL) values which did not exist on CCISS devices. Consequently, the storage domain could not be created. Now, the HBTL value is not required for non-SCSI devices, so creating a storage domain on a CCISS device succeeds.
BZ#885418
Previously, illegal configuration values for the parameters scsi_rescan_minimal_timeout rescan_maximal_timeout caused forceIscsiScan to throw an unhandled exception. This was due to a mistake in way that logging methods parsed arguments: the first argument was read as a message, and the other arguments were read as format values and options.

A patch, which makes use of the implicit string concat feature, ensures that these unhandled exceptions are not thrown because of wrongly parsed parameter values.
BZ#861701
Previously, when a network device was removed from the system through a means that was not VDSM, VDSM recognized that the device was removed and it stopped reporting the device to the engine. But libvirt still held a reference to the removed device. If the user attempted to create the device again, the attempt failed because libvirt still holds a reference to the non-existent device.

Broken networks are now removed from libvirt before new networks are set up. The re-creation of broken networks no longer causes exceptions.
BZ#925967
Previously, debugging messages displayed on the TUI (textual user interface) after the hypervisor was registered to the rhevm server.

A patch prevents debugging messages from being displayed on the TUI after the hypervisor registers to the rhevm server.
BZ#921595
Previously, VDSM assumed that display networks were backed by bridges.

The management network's address was sometimes provided where the display network's address should have been provided.

A call now replaces the bridge-specific information retrieval with a call that retrieves the name of the device (which can be a bridge or other supported network device. Only after the device name has been retrived is the address of that device retrieved.

The correct display network address is now returned, and display networks now work on non-virtual machine networks.
BZ#882667
Previously metadata related to the file storage domain was wrongly reported as missing. This meant that during some race conditions, attaching and detaching ISO domains and export domains led to those domains being put in a non-operational state.

VDSM now verifies the existence of file storage domain metadata. ISO domains and export domains no longer fall into a non-operational state due to an inability to find file storage domain metadata.
BZ#911799
Previously, the content of the command "tree <local-storage-path>" did not match the output of the sosreport "su_vdsm_-s_.bin.sh_-c.usr.bin.tree_-l_.rhev.data-center" file.

That command now matches the output of that sosreport file.
BZ#920074
Previously, VDSM filled /var/log/messages with useless warnings (Storage Domain warnings reporting that namespaces had already been registered were especially frequent among these). VDSM logs were rotated every hour, which made it impossible to access logs older than an hour. This made it impossible to analyze logs more than one hour old.

vdsm-4.10.2-16.0.el6ev.x86_64 does not fill /var/log/messages with "vdsm Storage.StorageDomain WARNING Resource namespace" messages.
BZ#905930
Previously (in vdsm-4.10.2-4.0.el6ev.x86_64), when the guest agent was installed on a guest and you had configured authentication against directory services, using SSO (single sign-on) to log in to a user account through the User Portal delivered you into a desktop in which the screen was locked.

In vdsm-4.10.2-16.0.el6ev.x86_64, SSO delivers you into a desktop in which the screen has not been locked.
BZ#918541
Previously, the VM Channels Listener thread stalled, which blocked communication between VDSM and the hosted guest agents.

A patch prevents the VM Channels Listener thread from stalling, removing the block in communication between VDSM and the hosted guest agents.
BZ#910445
Previously, live migrating a preallocated file (e.g. NFS) virtual disk to a different storage domain failed because of a problem in the preparation of the volumes at the destination.

Now, it is possible to regularly live migrate preallocated file virtual disks to a other storage domains.
BZ#875775
Previously, the --force option was not turned on in the vgextend command if the physical volume was in use. 

This meant that it was impossible to extend a storage domain if its associated physical volume was in use.

The --force option has now been turned on in the vgextend command, even when the physical volume is in use.

It is now possible to extend a storage domain when its associated physical volume is in use.
BZ#912308
When the vdsm.log file is removed, either manually or by logrotate, the supervdsm user can create the log file and set its ownership to root:root. When this happens, the vdsmd service is stopped until the user resets the vdsm.log ownership to vdsm:kvm, and restarts the service. This update separates supervdsm log to a supervdsm.log file, so after vdsm.log is rotated it remains owned by vdsm:kvm.
BZ#920532
Previously, attaching a large number of storage domains could result in failure; some but not all of the storage domains would attach.

Now, attaching a large number of storage domains works as expected.
BZ#878667
VDSM hooks for hotplugging (and hot unplugging) NICs have been added.
BZ#958119
Previously, ksmState did not change with /sys/kernel/mm/ksm/run changes, but changed only when a virtual machine was started or stopped. This meant that the value of ksmState did not accurately reflect its state.

The value of ksmState is now accurate.
BZ#852956
Domain codes and libvirt error codes were mixed by mistake, so restarting the libvirt daemon caused the libvirt client socket to close on Red Hat Enterprise Virtualization Manager. In addition, libvirt reported internal errors if libvirtd is restarted or stopped, for example after a crash. This update resolves the mixed codes and adds missing error codes. Restarting libvirtd now correctly restarts VDSM connections.
BZ#907587
Previously, VDSM was unable to provide Red Hat Enterprise Virtualization Manager with all CPU information for AMD Bulldozer CPU architecture. This was because the AMD Bulldozer architecture consists of "modules", which are represented both as separate cores and separate threads. Management applications must choose between the thread-based approach and the core-based approach.

Libvirt now provides XML output that contains more information about the processor topology so that management applications like VDSM are able to extract the information they require.
BZ#918666
Previously, when new bonds were added via the setupNetwork method (through either the GUI or the SDK), the validation of bonding options fails and the setupNetworks for the bond failed to apply. This was because the procedure that checked for existing bond device options was performed by determining whether sysfs exposed the bonds for the requested device.

The bond required for validation is now created before the setupNetwork method is called. The bonding options check is now performed successfully, and requests for bonded networks comply with the rest of the network validation done by setupNetworks. In the case of incorrect bonding options, meaningful errors are provided to the user.
BZ#919201
Previously, VDSM did not distinguish between migration failures caused by high guest-memory writes and migration failures caused by high network load. This meant that it wasn't clear why migrations failed.

New error messages in the logs allow users to distinguish between migration failures caused by high guest-memory writes and migration failures caused by high network load.
BZ#928861
Previously, VDSM did not start properly due to an exception when trying to set logger syslog configuration. Part of the setting was connecting to /dev/log, but that didn't exist when a corruption existed in the syslog.conf file.

Now, during startup of the rsyslogd service, rsyslogd verifies that the /dev/log socket exists and is accessible. If the /dev/log socket does not exist, "vdsmd service start" fails.
BZ#920688
Previously, VDSM threw an attribute error exception when trying to write to the log.

A patch to VDSM prevents attribute error exceptions when trying to write to the log.
BZ#875487
Previously it was not possible to break a bond and attach custom MTU networks to virtual machines while those virtual machines are running.

In vdsm-4.10.2-2.0, it is possible to break a bond and attach custom MTU networks to virtual machines while those virtual machines are running.
BZ#923964
Previously, during live storage migrations of 100GB disks, the disk (a logical volume) was duplicated, and the original disk was not removed. This led to situations in which a 100GB logical volume would be replaced with a 200GB logical volume.

This was due to a race with the volume statistics. If a volume extension request arrived when the VmStatsThread didn't update the apparentsize, then VDSM requested an extension on top of the original raw volume size.

This was fixed in vdsm-4.10.2-18.0.el6ev.
BZ#883390
When a Fibre Channel storage domain was created from a host that was not the Storage Pool Manager (SPM), the SPM failed to recognize the storage domain and could not attach it. Now, when the domain cannot be attached, the SPM scans for new domains and retries to attach the domain until it succeeds.
BZ#922515
Previously, VDSM failed to recover after restarts, and reported an error "AttributeError: 'list' object has no attribute 'split'".

The function storage.fuser.fuser() was patched, and VDSM now recovers as expected after restarts.
BZ#923773
Previously, vmHotplugDisk failed with "VolumeError: Bad volume specification". This was due to a multipath race, which resulted in a situation in which multipath was not given enough time to create the /dev/mapper entry for the LUN on the host the virtual machine is running on.

A patch now allows multipath the time it needs to create the /dev/mapper entry for the LUN on the host the virtual machine is running on.
BZ#893193
Previously, vdsm.log did not report the correct VDSM release for Red Hat Enterprise Virtualization 3.1 installations. 
 This was problematic for people remotely troubleshooting customer installations: it was not possible to determine which version of VDSM if you used the log files as your only source of information.

vdsm-4.10.2-18.0.el6ev reports the correct VDSM release in the log files.
BZ#890572
Previously, it was not possible to change the Management Server Port in the Red Hat Enterprise Virtualization Hypervisor Textual User Interface (TUI) if the host was registered twice.

The TUI has been updated so that it is now possible to change the Management Server Port at any time.
BZ#871616
Previously, guest agent information vanished after virtual machines were migrated several times. This was because the virtual machine channel listener was not handling any errors. If an error occurred, VDSM did not try to reconnect and the connection to the guest was lost for the lifetime of the guest or until VDSM was restarted.

A patch to VDSM introduces a mechanism to reconnect to the
channel. When an error occurs, the setup callback is called, which gives the handled client a chance to recreate the socket and prepare it for a connect.

After that callback is called, the erroneous connection is moved into the unconnected items dict where it will be handled by the event loop.

If there have been 5 or more unsuccessful attempts made the reconnect rate will be slowed down to the same time as specified for the 'read timeout'.

The items which are slowed down are moved into the 'reconnect_cooldown' dict.

After this patch is applied, guest agent information does not vanish after several virtual machine migrations.
BZ#927143
Previously, hot unplugging disks caused VDSM to stop communicating with guest agents.

The logic of the virtual-machine-cleanup code and the hot-unplugging code has now been separated.

Now when disks are hot-unplugged, VDSM does not touch guest-agent communication channels. When disks are hot-unplugged, VDSM removes only the detached disk from the virtual machine. This allows VDSM to continue communicating with guest agents after hot-unplugging of virtual disks.
BZ#864073
Previously it was not possible to import a virutal machine that was based on a template while at the same time changing the name of that virtual machine. This was because during the attempt to copy the image back to the data domain, the template was set to "ILLEGAL".

A patch to VDSM sets the template to "LEGAL". This makes it possible to import a virtual machine that was based on a template while at the same time changing the name of that virtual machine.
BZ#928334
Previously, when some fields were missing from domxml, virtual machines would fail to start. This was because VDSM would crash when those fields were missing.

A patch to VDSM makes it possible to create virtual machines when the NICs are sr-iov virtual functions.

VDSM now starts virtual machines even when some fields are missing from domxml.
BZ#881725
Previously, when the messagebus service was down, host deployment failed. This was because VDSM requires libvirtd and libvirtd requires messagebus and System-V init does not support dependencies.

The messagebus service is not started explicitly, and host deployment succeeds.
BZ#948346
Previously, a Red Hat Enterprise Virtualization 3.0 environment failed to upgrade to 3.1 environment under the following conditions: a 3.0 data center existed, containing two block domains, and the non-master domain was put into maintenance. Then the data center was upgraded to 3.1, and no links existed to the domain in maintenance. When the domain in maintenance was activated, the upgrade failed with the following exceptions:

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/blockVolume.py", line 408, in validateImagePath
    os.mkdir(imageDir, 0755)

Traceback (most recent call last):
...
  File "/usr/share/vdsm/storage/blockVolume.py", line 411, in validateImagePath
    raise se.ImagePathError(imageDir)
ImagePathError: Image path does not exist or cannot be accessed/created: ('/rhev/data-center/a2834714-c9d8-4316-878d-3af799f10feb/5db46280-a002-4d4e-b5cf-59533f9aa36d/images/aa136091-6d13-4088-a089-cacfed0bd7d6',)

The domain remained deactivated.

A patch now makes it possible for a 3.0 domain to be upgraded to a 3.1 domain.
BZ#948940
Previously, concurrent live storage migration of multiple disks sometimes resulted in a saveState exception.

vdsm-4.10.2-18.0.el6ev does not throw a saveState exception during concurrent live storage migration of multiple disks. Concurrent storage migration of multiple disks now succeeds.
BZ#949192
Previously, when a connectivity failure was raised by libvirt, VDSM began self-fencing. When VDSM restarted, it restarted the libvirt service. In large environments with high host loads, establishing the connection between VDSM and libvirt took quite a long time. Because the host doesn't respond until the connection to libvirt is back, this meant that host fencing would begin before the connection between VDSM and libvirt was established.

An upgrade allows VDSM to respond to API calls and report its status, which prevents the condition that previously caused premature and unwanted host fencing.
BZ#951057
Previously, VDSM did not report the storage domain in the domain statistics. This made it impossible for the backend to monitor the storage domains version during upgrades.

VDSM now reports the storage domain in the domain statistics.
BZ#955593
Previously, spurious errors were recorded in vdsm.log during virtual machine migrations (specifically, tracebacks were logged during migrations).

vdsm-4.10.2.19.0 does not record these spurious errors in vdsm.log.
BZ#956683
Previously, the default migration_max_bandwith (32MiBps) and the max_outgoing_migrations (5) saturated a 1Gbps link with migration traffic. This left no room for other kinds of traffic (for instance, management traffic).

The max_outgoing_migrations default has been changed from 5 to 3. Migrations no longer saturate a 1Gbps link.
BZ#919356
Previously, VDSM identified virtual network interfaces by their MAC addresses. This created a problem: if the user wanted to change the address at the same time it was unplugged, the virtual interface would not match. This meant that hot-unplugging a virtual interface and changing its MAC address at the same time raised an unexpected exception.

Virtual network interfaces are now matched by alias, with the MAC address used as a fallback. Post-unplug hooks are no longer allowed to be executed when the virtual interface is not found.

It is now possible to hot-unplug a virtual interface and change the MAC address in the same step.