Chapter 4. Technical Notes
4.1. RHEA-2016:1245 — Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory
The undercloud stack rc file is a Keystone v2 rc. Previously, when switching from a v3 rc file (such as the v3 overcloudrc), some of the v3 environment variables would still be present. As a result, Keystone authentication may not work correctly. With this release, all OpenStack-related environment variables are cleared in stackrc before the undercloud values are set. As a result, variables from a previous rc file can no longer be present in the environment after sourcing stackrc, so Keystone authentication works correctly.
The image_path parameter is no longer used. This update removes it from the undercloud configuration file.
In certain situations, the undercloud virtual IPs were not correctly validated due to an error in the validation logic. Consequently, the undercloud could be deployed with incorrect virtual IPs. The error has been fixed. Now the virtual IPs are correctly validated. Any problem in virtual IP configuration is discovered before actual deployment.
Previously, concurrent requests to create a volume from the same image could result in multiple entries in the Block Storage service's image cache. This resulted in duplicated image cache entries for the same image, which wasted space. This update adds a synchronization lock to prevent this. The first request to create a volume from an image will be cached, and all other requests will use the cached image.
With this enhancement, `glance-manage db purge` can now remove rows that are less than one day old. This was added because operators may need to run this operation on a regular basis. As a result, the value of the `age_in_days` option can be set to `0`.
The Time Series Database as a Service (gnocchi) and Aodh API endpoints now expose a `/healthcheck` HTTP endpoint on the REST API. Requesting this endpoint allows you to check the status of the service, and does not require authentication.
Previously, when a pre-update hook was set on a resource that was in a FAILED state, the Orchestration service recorded an event indicating the hook was active. The service would then immediately create a replacement resource without waiting for the hook to be cleared by the user. As a result, the tripleoclient service believed the hook to be pending (based on the event), but fail upon trying to clear it as the replacement resource did not have a hook set. This, in turn, prevented the director from completing an overcloud update with the following message: ERROR: The "pre-update" hook is not defined on SoftwareDeployment "UpdateDeployment" This also affected other client-side applications that used hooks. In the director, this could have also resulted in UpdateDeployment executing on two Controller nodes simultaneously, instead of serialized so that only one Controller is updated at a time. With this release, the Orchestration service now pauses until the hook is cleared by the user, regardless of the state of the resource. This allows director overcloud updates to complete even when there is an UpdateDeployment resource in a FAILED state.
Previously, while the Orchestration service could reset the status of resource when the state of the stack was incorrect, the service failed to do so when an update was retriggered. This resulted in resources being stuck in progress, which required database fixes to unblock the deployment. With this release, the Orchestration service now sets the status of all resources when it sets the status of the stack. This prevents the resources from getting stuck in progress, allowing operations to be retried successfully.
This update provides enhancements to the CephFS Native Driver in conjunction with the core OpenStack File Share Service (manila) infrastructure. The CephFS Native driver now supports read-only shares and improves recovery mode by deleting backend rules not in the 'access_list'.
With this enhancement, you can now enable the creation of non-public shares in the Dashboard. You can configure Dashboard to hide the checkbox that enables users to mark shares as public during the creation process. The default option will be to create shares as private without the box checked.
To implement the security groups trunk feature with neutron-openvswitch-agent, openvswitch firewall driver is required. This driver currently contains a bug 1444368 where ingress traffic is wrongly matched if there are two ports with same MAC address on different network segment on the same compute node. As a result, if a subport has the same MAC address as its parent port, ingress traffic won't be matched correctly for one of the ports. A workaround to achieve correctly handled traffic is to disable port-security on the parent port and subports. For example, to disable port security on port with UUID 12345, you need to remove security groups associated with the port: openstack port set --no-security-group --disable-port-security 12345 Note that no security groups rules will be applied to that port and traffic will not be filtered or protected against ip/mac/arp spoofing.
On DVR setups, the 'test_add_list_remove_router_on_l3_agent' from the 'test_l3_agent_scheduler.py' would not finish successfully. The testing procedure tried to bind a network interface to an L3 agent, although the interface had been bound to one previously, when a new router was created. The problem has been fixed. Now the interface will not be added to the router and assigned to the L3 agent until the test does so. As a result, the test finishes successfully.
This enhancement adds the ability to automatically reschedule load balancers from LBaaS agents that the server detects as dead. Previously, load balancers could be scheduled and realized across multiple LBaaS agents, however if a hypervisor died, the load balancers scheduled to that node would cease operation. With this update, these load balancers will be automatically rescheduled to a different agent. This feature is disabled by default and managed using `allow_automatic_lbaas_agent_failover`.
This enahncement implements the 'ProcessMonitor' class in the 'HaproxyNSDriver' class (v2), This class utilizes the 'external_process' module to monitor and respawn HAProxy processes if and when needed. The LBaaS agent (v2) loads 'external_process' related options and take a configured action when HAProxy dies unexpectedly.
This release adds pagination support to avoid resource-consuming usage requests on systems with a large number of instances. The v2.40 microversion of the nova API simple-tenant-usage endpoints use new optional query parameters 'limit' and 'marker' for pagination. The 'marker' option sets the starting point and the 'limit' option sets the number of records to be displayed after the starting point. If 'limit' is not set, nova will use the configurable 'max_limit' (1000 by default). Although older microversions will not accept these new query parameters, they will start to enforce the max_limit and results may be truncated as a result. Consider using the new microversion to avoid DoS-like usage requests and potentially truncated responses.
This update adds support for the version 5.1.0 MapR plugin.
Because SELinux policies concerning launching instances with DPDK enabled are incomplete, launching instances using DPDK with SELinux in enforcing mode will cause the launch to fail and AVC denials will appear in /var/log/audit/audit.log* concerning openvswitch and svirt. As a workaround, set SELinux to permissive on each compute node where DPDK is utilized as documented in section 220.127.116.11 here: Permanent Changes in SELinux States and Modes This will allow DPDK-enabled virtual machines to launch. This is a workaround and is expected to be temporary while the issue is investigated further.
When deleting a node in heat, the deletion command finished and the prompt returned, although the process was still going on in the background. If another command followed immediately, a conflict would occur and the consequent command would fail. The behavior of the process has been changed. Now, the prompt will only return, when the process finishes completely.
Automatic fencing setup can be used in director for easier High Availability deployments and upgrades. To benefit from the new feature, use the 'overcloud generate fencing' command.
If stopping the neutron-openvswitch-agent service, the stopping process sometimes took too long to exit gracefully and was killed by systemd. In this case, a running neutron-rootwrap-daemon remained in the system, which prevented the neutron-openvswitch-agent service to restart. The problem has been fixed. Now, an rpm scriplet detects the orphaned neutron-rootwrap-daemon and terminates it. As a result, the neutron-openvswitch-agent service starts and restarts successfully.
With this release, 'clustercheck' will only run on nodes specified in the 'wsrep_cluster_address' option of Galera. This change was implemented to take into account use cases where Galera is run on a dedicated node (as is made possible with composable roles). Previously, during minor updates 'clustercheck' ran on all nodes running pacemaker, assuming Galera was also on the same node.
The director set the 'tcp_list_options' stanza twice in '/etc/rabbitmq/rabbitmq.config'. This caused no adverse effects but could cause confusion in the future. This fix removes the redundant stanza. Only one 'tcp_list_options' stanza now appears in the configuration file.
It is now possible to use puppet hieradata to set the max_files and max_processes for QEMU instances spawned by libvirtd. This can be done through an environment file containing the appropriate puppet classes. For example, to set the max_files and max_processes to 32768 and 131072 respectively, use: parameter_defaults: ExtraConfig nova::compute::libvirt::qemu::max_files: 32768 nova::compute::libvirt::qemu::max_processes: 131072 This update also sets these values as the default, since QEMU instances launched by libvirtd might consume a large number of file descriptors or threads. This depends on Compute guest hosted on each compute node and of Ceph RBD images each instance attaches to. It is necessary to be able to configure these limits in large clusters. With these new default values, the Compute service should be able to use more than 700 OSDs. This was previously identified as the limit imposed by the low number of max_files (originally 1024).
OpenStack Platform 10 included a broken Big Switch agent configuration. Deploying Big Switch agents with the provided heat templates resulted in deployment failures. This fix updates the heat templates to properly deploy Big Switch agents. Now the director correctly deploys the Big Switch agent service in composable roles.
The default memory configuration for Memcached was 95 per cent of total available RAM, which could lead to resource contention. This fix lowers the default value to 50 per cent of total available RAM. You also can now configure this value using the 'MemcachedMaxMemory' setting. This helps reduce possible resource conflicts.
A bug in the overcloud package update script caused cluster services to always restart even if no packages were available for update. This fix corrects the check that determines if there are pending package updates. If no packages updates are available, the yum update script exits and does not restart cluster services.
For security reasons, the Overcloud only allows SSH key-based access by default. You can set a root password on the disk image for the overcloud using the virt-customize tool, which is found in the Red Hat Enterprise Linux Extras channel. After installing the tool and downloading the Overcloud images, use the following command to change the root password: $ virt-customize -a overcloud-full.qcow2 --root-password password:my_root_password Perform this operation prior to uploading the images into glance with the "openstack overcloud image upload" command.
The 'tuned-profiles-cpu-partitioning' package is now pre-installed in the 'overcloud-full.qcow2' image. For DPDK deployments, this package is necessary to help tune hosts and isolate CPU usage. The director contains appropriate firstboot scripts to enable the 'tuned' service with the necessary arguments.
Currently, the Red Hat OpenStack Platform director 10 with SRIOV overcloud deployment fails when using the NIC IDs (for example, nic1, nic2, nic3 and so on) in the compute.yaml file. As a workaround, you need to use NIC names (for example, ens1f0, ens1f1, ens2f0, and so on) instead of the NIC IDs to ensure the overcloud deployment completes successfully.
When upgrading or deploying a Red Hat OpenStack Platform environment integrated with an external Ceph Storage Cluster from an earlier version (that is, Red Hat Ceph Storage 1.3), it is necessary to enable backwards compatibility. To do so uncomment the following line in environments/puppet-ceph-external.yaml during upgrade or deployment: parameter_defaults: # Uncomment if connecting to a pre-Jewel or RHCS1.3 Ceph Cluster RbdDefaultFeatures: 1
This release features the necessary puppet modules for deploying CephFS. This allows you to deploy the OpenStack Shared File System service (openstack-manila) with a CephFS back-end through the director.
Previously, sometimes a deployment failed with the following error: Error: /Stage[main]/Pacemaker::Corosync/Exec[Start Cluster tripleo_cluster]/returns: change from notrun to 0 failed: /sbin/pcs cluster start --all returned 1 instead of one of 0 With this update, a small race condition where puppet pacemaker could fail during cluster setup was closed. As a result, the deployment works correctly without errors.
Previously, all pacemaker services had to be part of the same role. With this update, a new feature allows you to use composable roles with pacemaker managed services. This feature is needed in order to scale out pacemaker managed services on more and different nodes.
Previously, the OpenStack Dashboard service was configured in the wrong step of the deployment, resulting in horizon being temporarily unavailable during deployments and leading to additional 'httpd' service restarts. With this update, the OpenStack Dashboard configuration is fixed to occur at the same time as the rest of the 'httpd' configuration. As a result, horizon does not become temporarily unavailable when running the overcloud.
The administrator needs to record the user credentials during the volume transfer operation and doing so by hand is inconvenient. With this update, a new button to download the credentials has been added to the volume transfer screen for saving the information easily. This allows the administrators to download and save a CSV file locally on their computer with the click of a button.
To avoid memory bloat issues in the nova-api workers, pagination logic has been added to the simple-tenant-usage API extension.
Previously, improper handling of the user IDs containing underscore in the code made it impossible to update project/domain members when the user IDs contained underscores. With this update, the code that handles the user IDs has been corrected to properly handle underscores. As a result, the project/domain members can now be updated even if they contain underscores.
After the optimization of event retrieval process, the 'openstack stack hook poll' command stopped returning pending hooks, even if they existed and should be returned. The problem was fixed. Now pending hooks are returned correctly.
The '--os-interface' switch was ignored by 'openstack network' commands. Consequently, all such commands used the 'public' endpoint, although other interfaces were specified. The support for the switch has been added. Now the 'openstack network' commands correctly use the endpoint specified in '--os-interface' switch.
The Remote Procedure Call (RPC) message acknowledgement in Oslo Messaging was not thread-safe. Consequently, a race condition caused an RPC timeout in Ceilometer. The message acknowledgement in Oslo Messaging has been fixed. Now, Ceilometer responds correctly.
The Oslo Messaging did not initialize its configuration properly. As a result, the 'nova-manage' client failed during startup. The error has been fixed. Now 'nova-manage' starts correctly.
Previously, a failed update or upgrade would return an exit value of 0, so it was not possible to test for success based upon this value. With this update, a failed update or upgrade will throw an exception to signify to OpenStackClient that there is an error condition. As a result, OpenStackClient will only return an exit value of 0 on success, and a non-zero value after an error.
The 'openstack overcloud image upload' ignored the '--image-path' argument when uploading or updating overcloud images. Consequently, only images in the working directory could be used. The support for the '--image-path' argument has been added. Now images from different directories, specified by the argument, can be uploaded flawlessly.
pacemaker continuously crashes when the fencing device name and the host name are the same. To avoid this problem, add the "fence-" prefix or the "-fence" suffix to the name of the fencing device. With the names configured like this, the cluster works without errors.