Chapter 3. Notable Bug Fixes
- CVE-2019-10197 (Moderate)
- A combination of parameters and permissions could allow user to escape from the share path definition.
- Previously, running
gluster volume status <volname> inodeoutput the entire inode table, which could time out and create performance issues. The output of this command is now more streamlined, and the original information should now be obtained by performing a statedump.
- BZ#1734423, BZ#1736830, BZ#1737674
- Previously, dynamically allocated memory was not freed correctly, which led to an increase in memory consumption and out-of-memory management on gluster clients. Memory is now freed correctly so that memory overruns do not occur.
- Previously, glusterfs enabled kernel auto-invalidation, which invalidates page cache when ctime changes. This meant that whenever writes occurred before, during, and after a ctime change, the page cache was purged, and the performance of subsequent writes did not benefit from caching.Two new options are now available to improve performance.The mount option
auto-invalidation[=on|off]is now enabled by default, and specifies whether the kernel can automatically invalidate attribute, dentry, and page cache. To retain page cache after writes, set this to 'off', but only if files cannot be accessed by two different mount points concurrently.The volume option
performance.global-cache-invalidation=[on|off]overrides the value of
performance.cache-invalidation. This option is disabled by default, but when enabled purges all read caches related to gluster when a stat change is detected. Turn this option on only when a file can be accessed from different mount points and caches across these mount points are required to be coherent.If both options are turned off, data written is retained in page cache and performance of overlapping reads in the same region improves.
- Brick status was displayed as started when the brick was in a starting or stopping state because the
get-statusoperation only tracked the started and stopped states. The
get-statusoperation now reports state more accurately.
- When a gluster volume has a
bind-addressspecified, the name of the rebalance socket file becomes greater than the allowed character length, which prevents rebalance from starting. A hash is now generated based on the volume name and UUID, avoiding this issue.
- BZ#1670415, BZ#1686255
- A small memory leak that occurred when viewing the status of all volumes has been fixed.
- If a user configured more than 1500 volumes in a 3 node cluster, and a node or glusterd service became unavailable, then during reconnection there was too much volume information to gather before the handshake process timed out. This issue is resolved by adding several optimizations to the volume information gathering process.
- Previously, while migrating a virtual machine, libvirt changed ownership of the machine image if it detected that the image was on a shared file system. This prevented virtual machines from accessing the image. This issue can no longer be reproduced.
- Access Control List settings were not being removed from Red Hat Gluster Storage volumes because the removexattr system call was not being passed on to the brick process. This has been corrected and attributes are now removed as expected.
Fixes for Dispersed Volumes
- If a file on a bad brick was being healed while a write request for that file was being performed, the read that occurs during a write operation could still read the file from the bad brick. This could lead to corruption of data on good bricks. All reads are now done from good bricks only, avoiding this issue.
- When bricks are down, files can still be modified using the
O_TRUNCflag. When bricks function again, any operation that modified the file using file descriptor starts open-fd heal. Previously, when open-fd heal was performed on a file that was opened using
O_TRUNC, a truncate operation was triggered on the file. Because the truncate operation usually happened as part of an operation that already took a lock, it did not take an explicit lock, which in this case led to a NULL lock structure, and eventually led to a crash when the NULL lock structure was de-referenced. The
O_TRUNCflag is now ignored during an open-fd heal, and a truncate operation occurs during the data heal of a file, avoiding this issue.
- Previously, when an update to a file's size or version failed, the file descriptor was not marked as bad. This meant that bricks were assumed to be good when this was not necessarily true and that the file could show incorrect data. This update ensures that the file descriptor is marked as bad with the change file sync or flush fails after an update failure.
Fixes for Distributed Volumes
- Previously, when
parallel-readdirwas enabled, stale linkto files could not be deleted because they were incorrectly interpreted as data files. Stale linkto files are now correctly identified.
Fixes for Events
- Previously, the network family was not set correctly during events socket initialization. This resulted in an invalid argument error and meant that events were not sent to consumers. Network family is now set correctly and events work as expected.
Fixes for automation with gdeploy
- The configuration options
user.cifs=enableare now set on the volume during Samba setup via gdeploy, ensuring setup is successful.
- Previously, when samba was configured using gdeploy, the samba user was not created on all nodes in a cluster. This caused problems during failover of CTDB, as the required user did not exist. gdeploy now creates the samba user on all nodes, avoiding this issue.
Fixes for Geo-replication
- During geo-replication, when a sync was attempted for a large number of files that had been unlinked and no longer existed on master, the tarssh process hung because of a deadlock. When the stderr buffer of the tar process filled before tar completed, it hung. Workers expected tar to complete before reading stderr, but tar could not complete until the buffer was freed by being read. Workers now begin reading stderr output as soon as the tar process is created, avoiding the issue.
- Geo-replication now synchronizes correctly instead of creating additional files when a large number of different files have been created and renamed to the same destination path.
- In non-root geo-replication sessions, gluster binary paths were not added to PATH variable, which meant that gluster commands were not available to the session. Existing
gluster-command-slave-diroptions can be used to ensure that sessions have access to gluster commands.
- Geo-replication now succeeds when a symbolic link is renamed multiple times between syncs.
Fixes for NFS-Ganesha
- A race condition existed where, when attempting to re-establish a connection with an NFS client, the server did not clean up existing state in time. This led to the new connection being incorrectly identified as having expired, rendering the mount point inaccessible. State is now cleaned before a new connection is accepted so this issue no longer occurs.
- NFS-Ganesha used client credentials for all operations on Gluster storage. In cases where a non-root user was operating on a read-only file, this resulted in 'permission denied' errors. Root permissions are now used where appropriate so that non-root users are able to create and write to files using 0444 mode.
Fixes for Replication
- When eager-lock lock acquisition failed during a write transaction, the previous lock was retained, which blocked all subsequent writes and caused a hang. This is now handled correctly and more specific log messages have been added to assist in diagnosing related issues.
cluster.quorum-countvolume option was not being updated in the volume configuration file for Gluster NFS volumes because when the last part of the file read is smaller than the buffer size, the data written from the buffer was a combination of new and old data. This has been corrected and Gluster NFS clients now honor
cluster.quorum-typeis set to
Fixes for Sharding
- Deleting a file with a large number of shards timed out because unlink operations occurred on all shards in parallel, which led to contention on the
.sharddirectory. Timeouts resulted in failed deletions and stale shards remaining in the
.sharddirectory. Shard deletion is now a background process that deletes one batch of shards at a time, to control contention on the
.sharddirectory and prevent timeouts. The size of shard deletion batches is controlled with the
features.shard-deletion-rateoption, which is set to
Fixes for Web Administration
- The previously shipped version of the python2-pyasn1 package caused IPA client installation to fail. This package is replaced with updates to tendrl-notifier and tendrl-commons so that pysnmp is used instead of python2-pyasn1, and installation works as expected.Before upgrading to Red Hat Gluster Storage Web Administration 3.5, remove the python2-pyasn1 and pysnmp packages (but not their dependencies) by running the following commands:
# rpm -e --nodeps $(rpm -qa 'python2-pyasn1') # rpm -e --nodeps $(rpm -qa 'pysnmp')
- Previously, tendrl did not set an owner for the
/var/lib/carbon/whisper/tendrldirectory. When the owner of this directory was not the
carbonuser, carbon-cache could not create whisper files in this location. Tendrl now ensures the directory is owned by the
carbonuser to ensure whisper files can be created.
- Previously, errors that occurred because
tendrl-monitoring-integrationwas not running were reported with generic error messages. More specific error messages about
tendrl-monitoring-integrationstatus is now logged in this situation.
- Previously, Red Hat Gluster Storage web administration expected all nodes to be online before any node could stop being managed by web administration. It is now possible to remove a node from being managed even when one or more nodes in the cluster are not online.
- Red Hat Gluster Storage web administration previously received all split brain related events and displayed these as errors in the user interface, even when they were part of correctly operating heal processes. Events are now filtered based on the client identifier to remove unnecessary and erroneous errors from the user interface.
- Previously, when all nodes in a cluster were offline, the web administration interface did not report the correct number of nodes offline. Node status is now correctly tracked and reported.
- The node-agent service is responsible for import and remove (stop managing) operations. These operations timed out with a generic log message when the node-agent service was not running. This issue is now logged more clearly when it occurs.
- Previously, Ansible 2.8 compatibility did not work correctly. Red Hat Storage Web Administration is now compatible with Ansible 2.8.