Chapter 4. RHBA-2016:0362 vdsm

The bugs contained in this chapter are addressed by advisory RHBA-2016:0362. Further information about this advisory is available at https://rhn.redhat.com/errata/RHBA-2016-0362.html.

vdsm

BZ#1188251
Previously, VDSM did not consume pre-defined ifcfg interfaces and did not consider them as belonging to it. When a setupNetworks command was issued and failed, VDSM failed to restore the original ifcfg file. Note that this bug only occurs when failing to set up network on top of a pre-existing ifcfg file. Now, VDSM stores pre-defined ifcfg files before they are modified, even with unified persistence.
BZ#1182094
Previously, NUMA statistics were collected every time VDSM was queried for host statistics. This resulted in a higher load and unnecessary delays as collecting the data was time consuming as an external process was executed. Now, NUMA statistic collection has been moved to the statistics threads and the host statistic query reports the last collected result.
BZ#1207610
The Memory Overcommitment Manager (MOM) policy formula for CPU limits previously used fixed constants and divided those by the amount of CPUs. The result was too low on hosts with more than 100 CPUs and the value was refused by libvirt, which caused performance degradation in virtual machines. The CPU limit formulas have been improved and as a result, the CPU limits can now handle any number of CPUs.
BZ#1269424
Previously, VDSM memory consumption continually increased on some environments, caused by a memory leak in VDSM. The code has been updated to eliminate the VDSM memory leak, and there is no longer a memory usage increase when running VDSM.
BZ#1215967
With this update prepareImage is now called by VDSM to mount the required NFS storage to deploy additional hosts using NFS.
BZ#1155583
When live merging snapshots on a block storage domain, the merge target volume is pro-actively extended to accommodate active writing of the source volume. This may cause some over-extension of the target volume.
BZ#1256949
Previously, a Memory Overcommitment Manager (MOM) policy rule computed KSM's sleep_millisecs value using a division with the amount of host memory being part of the divider. As a result, the sleep_millisecs value dropped below 10ms on hosts with more than 16GiB of RAM. That value was invalid and too aggressive, causing a huge CPU load on the host. In this release, the sleep_millisecs value was bounded to never drop below 10ms, thus improving the CPU load on affected machines.
BZ#1247075
Previously, excessive thread usage in VDSM and Python runtime architecture caused poor VDSM performance on multicore hosts. VDSM now supports affinity in order to pin its processes to specific cores. CPU usage is reduced as a result of pinning VDSM threads to a smaller number of cores.
BZ#1247058
NUMA nodes can exist without memory (for example, when hotswapping memory modules). This was not considered in VDSM, causing the statistics reporting mechanism (getVdsStats) to break. Now, this error has been fixed by explicitly checking for NUMA nodes with zero memory, and returning a memory usage of 100%.
BZ#1112861
Previously, a host became non-operational when a network interface was blocked in a multipath environment where the host contained two network devices both configured for the same subnet and configured with an iSCSI bond. This occurred because by the time the failing path is active or ready again, it is not considered by multipath anymore until a new host changes state to Up. Now, when configuring the iSCSI bond network interfaces, VDSM configures the multipath with the correct interface passed by the engine. As a result, when one of the network interfaces on the same subnet becomes non-responsive, path 2 will be used to reach the iSCSI target, and hosts will continue to operate normally.
BZ#1226911
In some cases after migration fails, many error messages flooded the VDSM log, which caused one of the VDSM threads to consume 100% CPU. This was caused by incorrect usage of epoll, which has been fixed in this update.
BZ#1261007
Previously, when using a separate display network, the VDSM service ignored the specific listening IP address and listened to all connections via the management network. With this update, the VDSM service uses the display network settings as expected.
BZ#1156194
Previously, when a virtual machine was resumed from a suspended state its time was not updated, resulting in incorrect time in the guest operating system. This happened because the functionality to update time on resume was missing in VDSM. Now, the functionality has been added and the time is now updated after a virtual machine is resumed from a suspended state. As long as the guest operating system supports this feature and qemu-guest-agent is running. It is still recommended to have NTP services running in the guest operating system to provide precise time for the guest.
BZ#1267444
Some host deployments or upgrades failed previously because VDSM did not handle empty supplementary groups in the sanlock process. These groups were left unconfigured when a race occurred in the supplementary groups configuration during sanlock startup. In this release, VDSM handles empty supplementary groups, and host deployment or upgrade will not fail if VDSM checks the sanlock configuration before supplementary groups are configured.
BZ#1205058
The previous virtual machine payload ISO used only the Rock Ridge extension, and as a result Windows virtual machines could not use payloads with long filenames. This fix adds Microsoft's Joliet filesystem extension to the generated ISO, so that payloads with long filenames are now displayed correctly on both Windows and Linux systems.
BZ#1203891
On the public internet, there are many random attempts to expose well-known ports on servers, in order for VNC to gain access to the machine behind the ports. These remote connection attempts triggered disconnect events which locked the console screen, even if the connection had not established a valid VNC or SPICE session. Now, only the client IP and port of the current known session can disconnect the console.
BZ#974510
Previously, when a user was connected to a SPICE console via a SPICE proxy, the console connection would drop during virtual machine migration. This happened because the client machine was not able to connect to the display on the destination host machine. Now, for both SPICE and VNC, console access is not interrupted.

Note the following limitations:
1. On virtual machines where both displays (SPICE and VNC) are configured, the console connection persists only when using SPICE, and will otherwise fail.
2. Uninterrupted console access only works with remote-viewer and plugins. It does not work with integrated web-based clients (noVNC and SPICE HTML5) or with third-party VNC viewers.
BZ#1296936
With this update, the MOM component no longer fails to enforce QoS policies, KSM, and memory ballooning.
BZ#1182247
Red Hat Enterprise Virtualization supports virtual machines with up to 240 vCPUs. The previous version supported a maximum of 160 vCPUs.
BZ#922744
With this change, the glusterfs-cli package must be installed. Please note that in the VDSM spec file, there is no dependency to glusterfs-cli. This means that VDSM installation will succeed even if glustefs-cli is not installed, and that the glusterfs-cli package must be installed manually.
BZ#880738
Previously, when accessing a device that was no longer available various commands used to block for several minutes because of various issues in the underlying platform. This caused long delays in VDSM threads waiting on blocked commands, leading to long delays and timeouts in various flows. Now, there have been several fixes in the kernel and device-mapper-multipath and VDSM multipath has been improved. VDSM now requires a fixed version of these components. When accessing a device that is unavailable minimal delays are expected when the device is first accessed. After the first access the device is now considered faulty by multipath and no further delays are expected.
BZ#1229177
This change disables the ability to use VDSM in clusters from 3.0 to 3.3. Red Hat Enterprise Virtualization 3.6 drops support for cluster levels between 3.0 and 3.3 and Manager versions 3.3 and below. Support for Red Hat Enterprise Virtualization Manager 3.4 is still available as tech-preview.
BZ#1286997
With this update, the vdsm-hook-vmfex-dev package is included with VDSM. Users can now connect their virtual machine network profiles to Cisco UCS-defined port profiles.
BZ#1004101
Previous versions of VDSM used an inadequate algorithm to calculate the downtime of virtual machines in certain scenarios, causing migration to fail when virtual machines were running heavy loads. This version of VDSM implements a new algorithm to estimate a virtual machine's downtime, and migrations in these scenarios converge more easily.
BZ#1128881
Previously, VDSM reported all channel devices as 'Unknown' device types with a warning. This was not correct and has now been fixed.
BZ#1142776
Simultaneous migration of many virtual machines could create a deadlock between the threads that monitor the hosts. As a result, the hosts were not monitored, thus the status of their virtual machines were not updated by the Red Hat Enterprise Virtualization Manager. The virtual machines are now monitored to prevent the deadlock.
BZ#1219364
With this update, the CPU usage of the VDSM management process has been reduced. This increases the performance and scalability of each hypervisor. As a consequence some of the functionality of VDSM was separated out to a standalone "MOM" process. These functions are KSM, ballooning and QoS policy enforcing.
BZ#1219903
Previously, libvirt reported a "metadata not found" error in vdsm.log when a query was made to get the missing metadata element. This was not actually an error, but a misleading message issued by VDSM. An empty metadata element has been added to the code so that this message will no longer appear in the log.
BZ#1217401
Previously, adding a direct LUN to a virtual machine sometimes timed out. This occurred because a physical volume (PV) create test is performed for each device when calling getDeviceList. Since a PV create test requires significant resources, it affects the response time of GetDeviceList in scale setups, sometimes causing timeouts on the Red Hat Enterprise Virtualization Manager. This has been fixed and the PV create test can now be skipped using a flag. If the PV test is needed to know the usage state of the device, it can be run on specific devices, therefore minimizing the impact and decreasing the user waiting time on the following operations:
1. UI - Add direct LUN disk
2. UI/REST - Add ISCSI/FC storage domain
3. UI/REST - Edit ISCSI/FC storage domain
4. UI/REST - Extend ISCSI/FC storage domain
5. REST - Add direct LUN disk (if <host> parameter is provided)
6. UI/REST - Import an iSCSI/FCP domain
BZ#1215610
Previously, Red Hat Enterprise Virtualization incorrectly configured the hypervisor for certain Windows versions, resulting in significant time drift on Windows virtual machines running high CPU loads. Code has been added to VDSM to inject periodic RTC interrupts, to prevent lost interrupts which caused time drift in Windows guests. The recommended hypervisor settings are now configured for Windows versions and there is no longer time drift in Windows virtual machines.
BZ#1215387
With the release of Red Hat Enterprise Virtualization 3.6, VDSM will no longer support engines older than Red Hat Enterprise Virtualization 3.3.0. It will, however, continue to support clusters and data centers with compatibility versions lower than 3.3.
BZ#1126206
File-type storage domains now use separate IOProcess instances. This improves performance, and prevents one slow or unreachable storage domain from affecting other storage domains.
BZ#965929
When Red Hat Enterprise Virtualization configures a network on a host, it generates several ifcfg files with specific content. If a user wants to tweak the content by adding or removing any initscripts, they must deploy a hook script to do so whenever the ifcfg file is rewritten.

The example hook script will add the following entries to the ifcfg file of nic 'ens11' when ifcfg is modified:
  USERCTL=yes
  ETHTOOL_OPTS="autoneg on speed 1000 duplex full"

To add the hook script to VDSM hooks, place the file in /usr/libexec/vdsm/hooks/before_ifcfg_write, and ensure the VDSM has permissions to the file.

The VDSM checks this directory every time ifcfg configuration is changed, and executes each script in this directory.
As input to the script, the VDSM will pass the path to a json file containing the ifcfg file data, for example:

{
  "config": "DEVICE=ens13\nHWADDR=52:54:00:d1:3d:c8\nBRIDGE=z\nONBOOT=yes\nMTU=1500\nNM_CONTROLLED=no\nIPV6INIT=no\n", 
  "ifcfg_file": "/etc/sysconfig/network-scripts/ifcfg-ens7"
}

Modified ifcfg file contents (under the "config" entry) can be written to a json file, and will be used by VDSM as the new ifcfg file content.
If no file is given, VDSM will use the unmodified content.

The following is a description of the example hook script.

Reading in the data from the json file:
  
  hook_data = hooking.read_json()

Getting the value of the new ifcfg file content:
  
  config_data = hook_data['config']

Getting the name of the ifcfg file which will be modified:
  
  ifcfg_file = hook_data['ifcfg_file']

Modify and write the content of the ifcfg file:
  
  config_data += "USERCTL=yes\nETHTOOL_OPTS=\"autoneg on speed 1000 duplex full\"\n"
  hook_data['config'] = config_data
  hooking.write_json(hook_data)

The file also dumps the data read from the json file to file (/tmp/hook_data), to show the format of the input json file:
  
  with open("/tmp/hook_data",mode='w') as file:
      file.write( json.dumps(hook_data))
BZ#1123052
Red Hat recommends using glusterfs volumes that are replica type and replica count 1 or 3. Previously, VDSM did not validate glusterfs volumes when adding glusterfs storage domains. As a result, administrators could configure glusterfs volumes with an unsupported replica type and count. VDSM now validates the required parameters before using a glusterfs volume as a storage domain, and if the glusterfs volume configuration is not supported, there is a warning.
BZ#1206231
VDSM in Red Hat Enterprise Virtualization 3.6 no longer supports Red Hat Enterprise Linux 6. In this bug, el6 support was removed from the vdsm.spec file to reduce the maintenance required.