Detailed notes on the changes implemented in Red Hat Enterprise MRG 2
Legal Notice
Abstract
Red Hat Enterprise MRG 2. Read this document before beginning to use the Red Hat Enterprise MRG distributed computing platform.
- 1. MRG Messaging on Red Hat Enterprise Linux 6
- 1.1. RHBA-2015:20980 – Red Hat Enterprise MRG Messaging 2.5 bugfix update
- 1.2. RHSA-2015:19811 – Moderate: qpid-cpp security and bug fix update
- 1.3. RHBA-2014:19311 – Red Hat Enterprise MRG Messaging and Grid bugfix update
- 1.4. RHBA-2014:18316 – Red Hat Enterprise MRG Messaging 2.5.5 bugfix update
- 1.5. RHBA-2014:17844 – Red Hat Enterprise MRG Messaging 2 bugfix update
- 1.6. RHBA-2014:0130 – Red Hat Enterprise MRG Messaging 2.4.4 bugfix update
- 1.7. RHBA-2013:1756 – Red Hat Enterprise MRG Messaging 2.3.6 bugfix update
- 1.8. RHSA-2013:1024 – Moderate: Red Hat Enterprise MRG Messaging 2.3.3 security and bugfix update
- 1.9. RHSA-2013:0562 – Moderate: Red Hat Enterprise MRG Messaging 2.3 security, bug fix, and enhancement update
- 1.10. RHSA-2012:1279 – Moderate: Red Hat Enterprise MRG Messaging 2.2 security, bug fix, and enhancement update
- 1.11. RHSA-2012:0528 – Moderate: Red Hat Enterprise MRG Messaging 2.1 security, bug fix and enhancement update
- 1.12. RHSA-2011:1399 – Moderate: Red Hat Enterprise MRG Messaging 2.0 bug fix update
- 1.13. RHBA-2011:1340 – Red Hat Enterprise MRG Messaging 2.0 bug fix update
- 1.14. RHEA-2011:0892 – Red Hat Enterprise MRG – Messaging 2.0 Release
- 2. MRG Messaging on Red Hat Enterprise Linux 5
- 2.1. RHBA-2015:20981 – Red Hat Enterprise MRG Messaging 2.5 bugfix update
- 2.2. RHSA-2015:19853 – Moderate: qpid-cpp security and bug fix update
- 2.3. RHBA-2014:19325 – Red Hat Enterprise MRG Messaging and Grid Bugfix Update
- 2.4. RHBA-2014:18790 – Red Hat Enterprise MRG Messaging 2.5.5 bugfix update
- 2.5. RHBA-2014:17845 – Red Hat Enterprise MRG Messaging 2 bugfix update
- 2.6. RHBA-2014:0129 – Red Hat Enterprise MRG Messaging 2 bug fix update
- 2.7. RHBA-2013:1755 – Red Hat Enterprise MRG Messaging 2.3.6 bugfix update
- 2.8. RHBA-2013:1023 – Red Hat Enterprise MRG Messaging 2.3.3 bugfix update
- 2.9. RHSA-2013:0561 - Moderate: Red Hat Enterprise MRG Messaging 2.3 security, bug fix, and enhancement update
- 2.10. RHSA-2012:1277 – Moderate: Red Hat Enterprise MRG Messaging 2.2 security, bug fix, and enhancement update
- 2.11. RHSA-2012:0529 – Moderate: Red Hat Enterprise MRG Messaging 2.1 security, bug fix and enhancement update
- 2.12. RHBA-2011:1393 – Moderate: Red Hat Enterprise MRG Messaging 2.0 bug fix update
- 2.13. RHBA-2011-1339 – Red Hat Enterprise MRG Messaging 2.0 bug fix update
- 2.14. RHEA-2011:0890 – Red Hat Enterprise MRG – Messaging 2.0 Release
- 3. MRG Realtime on Red Hat Enterprise Linux 6
- 3.1. RHSA-2014:19491 – Important: Red Hat Enterprise MRG Realtime 2.5 kernel update
- 3.2. RHEA-2014:19227 – Red Hat Enterprise MRG – Realtime 2.5 Enhancement
- 3.3. RHSA-2014:16964 – Important: kernel-rt security, bug fix, and enhancement update
- 3.4. RHSA-2014:0100 – Important: kernel-rt security and bug fix update
- 3.5. RHBA-2013:1293 – Red Hat Enterprise MRG Realtime 2.4 bug fix update
- 3.6. RHSA-2013:0829 – Important: kernel-rt security and bug fix update
- 3.7. RHBA-2013:0828 – Red Hat Enterprise MRG Realtime 2.3 bug fix update
- 3.8. RHSA-2013:0622 – Important: kernel-rt security and bug fix update
- 3.9. RHSA-2013:0566 – Important: kernel-rt security and bug fix update
- 3.10. RHBA-2013:0563 – Red Hat Enterprise MRG Realtime 2.3 bug fix update
- 3.11. RHBA-2012:1492 – Red Hat Enterprise MRG Realtime 2.2 bug fix update
- 3.12. RHSA-2012:1491 – Important: kernel-rt security and bug fix update
- 3.13. RHSA-2012:1282 – Moderate: kernel-rt security, bug fix, and enhancement update
- 3.14. RHEA-2012:1280 – Red Hat Enterprise MRG Realtime 2.2 enhancement update
- 3.15. RHSA-2012:0670 – Important: kernel-rt security and bug fix update
- 3.16. RHBA-2012:0496 – Red Hat Enterprise MRG Realtime 2.1 kernel bug fix update
- 3.17. RHBA-2012:0495 – Red Hat Enterprise MRG Realtime 2.1 bug fix update
- 3.18. RHSA-2012:0333 – Important: kernel-rt security and bug fix update
- 3.19. RHBA-2012:0044 – kernel-rt bug fix update
- 3.20. RHSA-2011:1253 – Important: kernel-rt security and bug fix update
- 3.21. RHEA-2011:0895 – Red Hat Enterprise MRG – Realtime 2.0 Release
- 3.22. RHEA-2011:0894 – Red Hat Enterprise MRG – Realtime 2.0 Release
- 4. MRG Realtime on Red Hat Enterprise Linux 5
- 5. MRG Grid on Red Hat Enterprise Linux 6
- 5.1. RHSA-2014:16649 – Moderate: Red Hat Enterprise MRG Grid 2.5 security, bug fix and enhancement update
- 5.2. RHSA-2013:1294 – Moderate: Red Hat Enterprise MRG Grid 2.4 security update
- 5.3. RHSA-2013:0565 – Red Hat Enterprise MRG Grid 2.3 security, bug fix and enhancement update
- 5.4. RHSA-2012:1281 – Moderate: Red Hat Enterprise MRG Grid 2.2 security, bug fix, and enhancement update
- 5.5. RHSA-2012:0099 – Moderate: MRG Grid security, bug fix and enhancement update
- 5.6. RHBA-2012:0046 – Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update
- 5.7. RHSA-2011:1250 – Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update
- 5.8. RHEA-2011:0891 – Red Hat Enterprise MRG – Grid 2.0 Release
- 6. MRG Grid on Red Hat Enterprise Linux 5
- 6.1. RHSA-2013:1295 – Moderate: Red Hat Enterprise MRG Grid 2.4 security update
- 6.2. RHSA-2013:0564 – Red Hat Enterprise MRG Grid 2.3 security, bug fix and enhancement update
- 6.3. RHSA-2012:1278 – Moderate: Red Hat Enterprise MRG Grid 2.2 security, bug fix, and enhancement update
- 6.4. RHSA-2012:0100 – Moderate: MRG Grid security, bug fix and enhancement update
- 6.5. RHBA-2012:0045 – Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update
- 6.6. RHSA-2011:1249 – Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update
- 6.7. RHEA-2011:0889 – Red Hat Enterprise MRG – Grid 2.0 Release
- A. Revision History
Chapter 1. MRG Messaging on Red Hat Enterprise Linux 6
- 1.1. RHBA-2015:20980 – Red Hat Enterprise MRG Messaging 2.5 bugfix update
- 1.2. RHSA-2015:19811 – Moderate: qpid-cpp security and bug fix update
- 1.3. RHBA-2014:19311 – Red Hat Enterprise MRG Messaging and Grid bugfix update
- 1.4. RHBA-2014:18316 – Red Hat Enterprise MRG Messaging 2.5.5 bugfix update
- 1.5. RHBA-2014:17844 – Red Hat Enterprise MRG Messaging 2 bugfix update
- 1.6. RHBA-2014:0130 – Red Hat Enterprise MRG Messaging 2.4.4 bugfix update
- 1.7. RHBA-2013:1756 – Red Hat Enterprise MRG Messaging 2.3.6 bugfix update
- 1.8. RHSA-2013:1024 – Moderate: Red Hat Enterprise MRG Messaging 2.3.3 security and bugfix update
- 1.9. RHSA-2013:0562 – Moderate: Red Hat Enterprise MRG Messaging 2.3 security, bug fix, and enhancement update
- 1.10. RHSA-2012:1279 – Moderate: Red Hat Enterprise MRG Messaging 2.2 security, bug fix, and enhancement update
- 1.11. RHSA-2012:0528 – Moderate: Red Hat Enterprise MRG Messaging 2.1 security, bug fix and enhancement update
- 1.12. RHSA-2011:1399 – Moderate: Red Hat Enterprise MRG Messaging 2.0 bug fix update
- 1.13. RHBA-2011:1340 – Red Hat Enterprise MRG Messaging 2.0 bug fix update
- 1.14. RHEA-2011:0892 – Red Hat Enterprise MRG – Messaging 2.0 Release
1.1. RHBA-2015:20980 – Red Hat Enterprise MRG Messaging 2.5 bugfix update
Bug Fixes
- BZ#1219554
- A race condition was discovered on qpid::management::ManagementAgent::debugSnapshot when one thread executed periodic processing with traces enabled, while the second thread processed a QMF method from the client. The race condition caused a segfault. The fix improves thread iteration behavior, which fixes the originally reported issue.
- BZ#1223730
- If a priority queue contained messages with differing priorities, rerouting the queue contents to a non-priority destination queue caused the messages to be ordered from lowest to highest, instead of highest to lowest. A fix to the MoveMessages::removeIf and PriorityQueue::removeIf changes the message traversal order to highest-to-lowest, which ensures that any rerouted messages with priority assigned are processed first in non-priority destination queues.
1.2. RHSA-2015:19811 – Moderate: qpid-cpp security and bug fix update
Security Fixes
- CVE-2015-0223
- It was discovered that the Qpid daemon (qpidd) did not restrict access to anonymous users when the ANONYMOUS mechanism was disallowed.
- CVE-2015-0224
- A flaw was found in the way the Qpid daemon (qpidd) processed certain protocol sequences. An unauthenticated attacker able to send a specially crafted protocol sequence set could use this flaw to crash qpidd.
Bug Fixes
- BZ#1142833
- Prior to this update, because message purging was performed on a timer thread, large purge events could have caused all other timer tasks to be delayed. Because heartbeats were also driven by a timer on this thread, this could have resulted in clients timing out because they were not receiving heartbeats. The fix moves expired message purging from the timer thread to a worker thread, which allow long-running expired message purges to not affect timer tasks such as the heartbeat timer.
- BZ#1197134
- Because of changes in the included security fixes, starting with this release Windows client programs that use the default PLAIN authentication may now fail if authentication is turned off on the broker, where before they erroneously succeeded. To work around this, explicitly use ANONYMOUS by using
--connection-options '{sasl_mechanisms:ANONYMOUS}'or you can use a valid username (to work with the default PLAIN mechanism if you have it set up). For example, with valid user 'jsmith', it would be--connection-options '{username:jsmith}'.
1.3. RHBA-2014:19311 – Red Hat Enterprise MRG Messaging and Grid bugfix update
service qpidd stop on all nodes, or qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
1.4. RHBA-2014:18316 – Red Hat Enterprise MRG Messaging 2.5.5 bugfix update
service qpidd stop on all nodes, or qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
Bug Fixes
- BZ#860691
- Poor enqueueing performance was discovered when a priority queue with a ring queue policy had a large message backlog. Performance degraded quickly when messages of mixed priority were inserted into a queue with a large (more than 100,000) message backlog. The fix ensures disordered messages acquired by a consumer as part of a cluster update, but not yet acknowledged at the time a new member joins the cluster, are reordered. During normal operation, new messages are appended directly to the back of the priority queues because the order of arrival is known. Enqueueing performance is improved for priority ring queues containing large numbers of messages with mixed priorities.
- BZ#1101644
- It was discovered that when the python framer used an SSL connection, it consumed an extra byte past the frame header (the ninth byte), rather than only consuming the first eight bytes (frame header). This caused the ninth byte to be dropped, and resulted in a connection failure on certain systems. Header parsing with SSL is now fixed to only consume the frame header, which allows the connection to be established correctly on all systems.
- BZ#1121615
- Previously, a gap in time existed between a socket opening and the mechanism for communicating about the connection starting. If the socket was disconnected within this gap there was no way to signal this event back to the federation code, which initiated the socket open request. This resulted in the socket remaining in the connected state. The connection communication mechanism is now started as soon as the socket connection is requested, which allows the federation code to be aware of any socket state changes.
- BZ#1128654
- It was discovered that calling Session::nextReceiver() with a duration of IMMEDIATE always returned null, even if there were incoming messages waiting. This bug, introduced as part of the fix for bug 958895 in the 2.3.3 release, resulted in incoming messages never being received. The fix allows for messages to be returned even if there is no waiting period. Messages are now properly delivered.
- BZ#1132395
- It was discovered that when sending messages over a federation link with no acking, the message delivery records were not properly marked when the message was delivered. This caused the broker memory footprint to increase unnecessarily with each message. The fix ensures the DeliveryRecord is correctly marked when the message is delivered, allowing the memory for the message to be released. The memory footprint of the broker can now decrease when messages are delivered.
- BZ#1136294
- It was discovered that when messages were purged from a priority ring queue, the message counters were not properly maintained. This caused messages to be deleted incorrectly, which caused memory to be consumed unnecessarily. The message counters are now properly maintained for this particular scenario. Messages are purged properly, and memory no longer accumulates during repeated purging operations.
- BZ#1142866
- It was discovered that a recently fixed portion of debug code, which was implemented to avoid a segfault, was executing even if debug mode was not enabled. This caused purging expired messages from a queue to take longer than it did prior to the fix. The code was reworked so that the debug code is only executed when debug mode is enabled. Purging messages now happens at the same speed it did before the erroneous change.
- BZ#1147968
- It was discovered that if Session.nextReceiver() with IMMEDIATE duration was called while the connection to the broker was being lost (due to a network issue, for example) , the method could fail to raise the expected MessagingException. This could cause a client application to not be notified that the connection has shut down, so the application can fail to reconnect. Session.nextReceiver now reliably returns the desired exception if the connection is closed. The client application gets the exception and can take appropriate action.
1.5. RHBA-2014:17844 – Red Hat Enterprise MRG Messaging 2 bugfix update
service qpidd stop on all nodes, or qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
Bug Fixes
- BZ#807670
- Previously, the qpid-python client was written to use select() to manage file descriptors. A known limitation with select() caused issues for processes that opened more than 1024 files (for example, a web server). If file descriptors had numerical values above 1024, and qpid-python allocated a descriptor of this value, the call to select() would fail and the connection would drop. The fix introduces modifications to qpid-python to use poll() instead of select(), which removes the numerical file descriptor limitation and fixes the issue.
- BZ#1079260
- A bug in the code caused statistics for Last Value Queues to be misreported by the Qpid Management Framework (QMF). The code was modified, and QMF now correctly reports the statistics for LVQ queues.
- BZ#1088004
- Previously, the python client decode routines expected a multi-frame command to be received in sequence, with no interleaving. It was discovered that due to the python client's expected operation, some valid AMQP sequences with interleaved heartbeats sent on different protocol tracks caused a decode error in the python client library. The fix introduces modified decode routines in the python client that take into account multiple protocol tracks. This corrects the decode errors originally encountered.
- BZ#1094781
- Previously, a gap in time existed between a socket opening and the mechanism for communicating about the connection starting. If the socket was disconnected within this gap there was no way to signal this event back to the federation code, which initiated the socket open request. This resulted in the socket remaining in the connected state. The connection communication mechanism is now started as soon as the socket connection is requested, which allows the federation code to be aware of any socket state changes.
- BZ#1098314
- Previously, a queue with the same name as an exchange made the exchange inaccessible to the python client. With this release, the python client can now reliably send to either a queue or exchange when they share the same name. The queue is preferred by default, and the exchange is indicated by specifying that the node type is topic, like so:
spout 'foo; {node:{type:topic}}'. - BZ#1103094
- It was discovered that a concurrent bind and delete on a queue could leave a binding in place, even though the queue was no longer accessible. Additionally, the queue lock was held when cancelling the auto-delete timer task. These factors caused memory usage to climb, and holding the lock while cancelling a timer could also result in a deadlock. Threads are now coordinated to ensure a concurrent bind and delete removes all bindings when a queue reaches the deleted state, and the timer task is cancelled outside the lock scope. This ensures no deadlocks or memory leaks when attempting to re-bind a queue that is being auto-deleted by a timer.
- BZ#1105094
- Previously, the neutron messaging client rewrote (by method of "monkey-patching") the python selector module to support eventlet threading. The rewritten client did not update select.poll() during this process, which is used by qpid-python to manage I/O. This resulted in poll() deadlocks and neutron server hangs. The fix introduces updates to the qpid-python library that avoid calling poll() if eventlet threading is detected. Instead, the eventlet-aware select() is called, which prevents deadlocks from occurring and corrects the originally reported issue.
1.6. RHBA-2014:0130 – Red Hat Enterprise MRG Messaging 2.4.4 bugfix update
- BZ#1030406
- Previously, message properties were printed without holding a lock. Another thread could modify them at the same time, causing memory corruption. The lock is now held while printing the properties, and concurrent modification is prevented.
- BZ#1030608
- Previously, a race condition in backup disconnect logic intermittently resulted in a core dump when a backup broker disconnected from the Primary. The disconnect logic was adjusted, and the core dump situation no longer occurs.
- BZ#1030673
- Previously, if more than two brokers were started simultaneously, it was possible for some brokers to miss part of the initial data from others, causing those brokers to hang. Now, brokers check for missed data during start-up and do an update if needed to catch up to the cluster. As a consequence, they no longer hang on start up.
- BZ#1036650
- Previously, the logic used to determine whether a message needed to be removed from the store did not account for the case where the message had already been removed from the in-memory queue, as is the case when an LVQ message is replaced. As a result, durable LVQ messages did not get removed from the store and the journal grew until it reached the limit. The logic now correctly accounts for messages that have already been removed from the in-memory queue. Durable LVQ messages are now removed from the store when replaced.
- BZ#1041575
- Previously, on closing a receiver with an exchange as its source, the library would issue a delete for the subscription queue, even if the queue was declared to be non-exclusive. As a result, shared subscription queues would be deleted when the first subscriber cancelled. Now, the delete is not issued unless the queue s exclusive. Shared subscriptions now work as expected.
- BZ#1042224
- Previously, consuming transient messages from a LVQ could cause a
size underflow on dequeueerror to occur when multiple messages with the same key value were sent and consumed. The dequeuing logic was adjusted to account for multiple messages with the same key value, and this error now does not occur. - BZ#1044002
- During connection setup by the python client, the heartbeat setting was ignored. As a result, connecting to an unresponsive process could block for an unlimited time. Now heartbeats are considered during connection set-up. With heartbeat configured, an attempt to connect will now abort after two heartbeat intervals have elapsed if not successful.
- BZ#1053759
- The interruption of
select()due to a signal was not handled. An unexpected exception could be thrown by any call onqpid.messagingthat involved await. The interruption of theselectcall is now handled. Receiving a signal now does not cause unexpected exceptions to be thrown from theqpid.messagingAPI.
1.7. RHBA-2013:1756 – Red Hat Enterprise MRG Messaging 2.3.6 bugfix update
- BZ#966636
- Previously, presenting the broker with a malformed certificate during client connection caused a segfault. In this release, the broker does not segfault and rejects the invalid certificate.
- BZ#990240
- Previously, if a client dequeued a message in a transaction, and then attempted to dequeue the same message in another transaction, the broker would segfault. In this release, the broker does not segfault when a message is transactionally dequeued more than once.
- BZ#990542
- Previously, in a two-node cluster with authentication enabled, a node with a dirty store would stall on restart when the other node had a clean store. In this release, a cluster node with a dirty store restarts as expected in a two-node cluster with authentication enabled.
- BZ#998450
- Auto-delete queues were not being accounted for correctly when tracking queue quotas for users. Creating an auto-delete queue would use up a user's queue creation quota, even after the queue was deleted. When auto-delete queues are deleted, the queue count for the user is now decremented. User quotas for queues are now maintained correctly, even if they use auto-delete queues.
- BZ#1005699
- Previously, when a message larger than the size of the journal was enqueued, the message enqueue failed, but no error was returned to the broker. The broker incorrectly assumed that the message was enqueued. An exception is now thrown when a message larger than the journal size is enqueued, and the broker knows the operation failed.
- BZ#1007398
- Previously, a race condition could occur on an LVQ queue. When an LVQ message was retrieved by a consumer, then replaced on the LVQ queue on the broker with a later message with the same LVQ key, acknowledging the retrieved message threw an exception
JERR_WMGR_DEQRIDNOTENQ: Dequeue rid is not enqueued on LVQ. The first message, which was retrieved by the consumer, was dequeued on the broker when it was replaced by the later message. In this release, LVQ messages are not dequeued unless they are deleted. Consumers can reliably acknowledge LVQ messages, even if they have been superseded in the queue since they were retrieved.
1.8. RHSA-2013:1024 – Moderate: Red Hat Enterprise MRG Messaging 2.3.3 security and bugfix update
Security Fixes
- CVE-2013-1909, Moderate
- It was discovered that the Qpid Python client library for AMQP did not properly perform TLS/SSL certificate validation of the remote server's certificate, even when the 'ssl_trustfile' connection option was specified. A rogue server could use this flaw to conduct man-in-the-middle attacks, possibly leading to the disclosure of sensitive information. (CVE-2013-1909)With this update, Python programs can instruct the library to validate server certificates by specifying a path to a file containing trusted CA certificates.This issue was discovered by Petr Matousek of the Red Hat MRG Messaging team.
Bug Fixes
- BZ#847038
- Parallel federation connections that used the same IP address and port did not recover on cluster failover. The cluster used the endpoint IP address and port number to identify specific Federation Links on the broker. When two Links were created to the same destination, they shared the same IP address and port. This prevented the cluster from distinguishing between the two links. Both connections were interpreted as a single link. The cluster recovery code now uses the link
nameparameter to uniquely identify each connection. The Links are now correctly restored should a failover event occur. - BZ#868951
- There was a race condition when a new broker joined a cluster. The link failover session could be given a channel that conflicted with a bridge session on another broker. This sometimes caused a cluster member to exit with a "
session.attached: incorrect session name" error. Channel 0 is now reserved for the failover session, and the cluster functions as expected. - BZ#923452
- The broker is informed by the client periodically which messages have been received, allowing the credit window to be moved forward. To do this it tracks how much of the current credit window has been used. However, if a fetch timed out this information was cleared. This could prevent the window ever being moved forward. A receiver could stop receiving messages even though there were messages available on the queue. The broker is now notified of which messages have been received whenever a fetch times out. The credit window is now always moved forward correctly and messages continue to be received when available.
- BZ#948489
- A broker may be configured to replicate queue events. On a large, busy system the replicated queue may generate many messages. If the replication queue consumers stop consuming messages then the broker is at risk of running out of memory as it fills with messages waiting for replication. A limit may be established that sets an upper limit on the number of messages in the replication event queue, using the parameter
--replication-queue-panic-message-limit. If that limit is exceeded then replication is stopped and the event queue is flushed to recover the memory and message resources that the queue was using. This fix is for exceptional conditions only. It gives the option to continue without replication instead of having the broker overload with messages that are not being consumed. The broker loses the ability to replicate messages as replication is stopped. However, the broker continues to run and avoids an out-of-memory condition. - BZ#951296
- In MRG 2.3, the credit limit on Federation links was set to twice the
--ackvalue. This caused performance problems with async replication. Async replication performance was poor unless the--ackvalue was set very high. With this release the credit on Federation links is returned to unlimited, as it was before MRG 2.3. Async replication performance is now restored to pre-2.3 levels, even with low--ackvalues. - BZ#951719
- ACL incorrectly denied a queue creation when the user did not specify an upper limit in the declaration request. The creation logic was adjusted to accept queue requests where the upper limit is not specified. The requests are now allowed.
- BZ#952750
Message.setJMSMessageID()was not compatible with ActiveMQ MessageID. The message implementation did not allow a non-UUID string to be set as the message ID. As a result, when a Qpid jms message was sent via a 3rd party message provider, setting themessage-idthrew an exception if it was not a valid UUID. You can now set any String as the message ID when sending a Qpid JMS message through a 3rd party sender.- BZ#953777
- A previous change made it so that a broker with a clean data store could not join an existing cluster. The previous change was modified so that a broker with a clean data store can now join the cluster.
- BZ#956240
- A previous change in the qpidd broker caused management updates to not be sent when the statistics for exchanges were changed. Since
qpid-toolis a passive tool (it does not send requests, it simply listens passively for updates), it did not receive or display updated statistics for exchanges. Note that "qpid-stat -e", which provides the same statistics, was not affected becauseqpid-statis an active tool that makes a single request for the latest statistics. The defect in the broker was fixed, and now updates are sent when statistics are changed.qpid-toolagain shows updating statistics on exchanges as those exchanges operate and update their statistics. - BZ#956764
- The
AMQDestination.equalsmethod did not take into account the differences between the ADDR and BURL syntax. As a result, the legacy fields were not populated until the address was validated. This resulted in a Null Pointer Exception in some circumstances.equalsand hashcode are now handled separately for ADDR and BURL, and behave as expected. - BZ#957869
- ACL delete rules can now be set on queues and exchanges based on their exclusivity, durability, and other properties.
- BZ#958895
- Incoming messages are held in a session level queue to ensure ordering. Any thread calling
Receiver::fetch()processes messages on this queue and blocks on it if there are none for that particular receiver. Messages for other receivers are put in a separate holding queue. If there was more than one thread concurrently processing the session queue, then a message of interest to one thread might be processed by another thread, preventing the correct thread from seeing it. Now, only one thread at a time can process the incoming queue. Concurrentfetch()calls now work as expected without needing to timeout to see a message as it arrives.
1.9. RHSA-2013:0562 – Moderate: Red Hat Enterprise MRG Messaging 2.3 security, bug fix, and enhancement update
1.10. RHSA-2012:1279 – Moderate: Red Hat Enterprise MRG Messaging 2.2 security, bug fix, and enhancement update
Security Fixes
- CVE-2012-3467
- It was discovered that the Qpid daemon (qpidd) did not require authentication for "catch-up" shadow connections created when a new broker joins a cluster. A malicious client could use this flaw to bypass client authentication.
Bug Fix
- BZ#825078
- When a broker was configured to require client authentication using the
ssl-require-client-authentication=yesoption, the QPID Management tools could not provide authenticated credentials. Consequently, these tools were failing with connection errors while attempting to manage QPID brokers. This update adds the--ssl-certificatecommand-line option for the QPID Management tools. This option allows the user to supply a file containing the correct credentials for the required authentication and the tools can now be used to manage a QPID broker that is configured to require client authentication.
service qpidd stop on all nodes, or run qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
1.11. RHSA-2012:0528 – Moderate: Red Hat Enterprise MRG Messaging 2.1 security, bug fix and enhancement update
Security Fixes
- CVE-2011-3620
- It was found that Qpid accepted any password or SASL mechanism, provided the remote user knew a valid cluster username. This could give a remote attacker unauthorized access to the cluster, exposing cluster messages and internal Qpid/MRG configurations.
Note
If you are using an ACL, the cluster-username must be allowed to publish to the qpid.cluster-credentials exchange. For example, if your cluster-username is "foo", in your ACL file:acl allow foo@QPID publish exchange name=qpid.cluster-credentials
The CVE-2011-3620 fix changes the cluster initialization protocol. As such, the cluster with all new version brokers must be restarted for the changes to take effect. Refer below for details.
Bug Fixes
- BZ#748126, BZ#755611
- When a failover occurred, some versions of JMS clients sent an invalid sequence of AMQP (Advanced Message Queuing Protocol) controls, duplicating the AMQP 0-10
connection-tune-okcontrol. Consequently, the broker left invalid pointers for timers connected with heartbeats, which caused the broker to terminate unexpectedly with a segmentation fault. With this update, invalid sequences of controls are properly caught and sanitized and the broker no longer crashes in the described scenario. - BZ#760112
- When an existing durable queue was recovered, management statistics data were not returned. Consequently, it was not possible to monitor the status of any recovered persistent queue. This bug has been fixed and the management statistics are now correctly initialized and can be queried in the same way as with newly created queues.
- BZ#735208
- Due to broken initialization logic in the store module, when an attempt to use management tools to view the store or journal statistics was made, no statistics were available even when the store was in use. This update fixes the initialization logic and the store module now correctly displays management statistics for the store plug-in.
- BZ#738490
- Previously, sessions, on which messages had been received from a queue, could delay aspects of that queue's deletion. Consequently, the queue object within the broker was not actually deleted and was still visible from tools but data could no longer be sent to or received from it. With this update, record of messages delivered from the deleted queue and acknowledged by the receiver is no longer kept in the broker after the subscription has been canceled, preventing the queue from being referenced after deletion. Now, deletion of a queue is no longer affected by the existence of sessions, through which messages from that queue have been received and acknowledged.
- BZ#730981
- Previously, the C++ client incorrectly tried to declare predefined exchanges without setting the passive flag as required by AMQP 0-10 specification for exchanges with reserved names. Consequently, attempts to create a sender (or a receiver) to (or from) a predefined exchange and utilize a create policy in the address in order to establish some node level bindings to it failed. With this update, exchanges with reserved names are treated as special cases and the passive flag is always set when declaring an exchange as part of the create policy. Now, bindings defined in the node section of an address whose named node is a predefined exchange can be established properly.
- BZ#726102
- Previously, using an address string with the C++ messaging API, which had a map within it with an empty key value, resulted in a parsing exception being returned. With this update, the parser has been updated to correctly handle empty values for key-value pairs in maps and the exception is no longer returned in the described scenario.
- BZ#731368
- Prior to this update, the example on the usage of the drain command shipped with the C++ client only consumed one message by default rather than all available messages. This update amends the example to work the same way as similar examples for Python and Java clients and all drain examples now behave consistently in consuming all available messages.
- BZ#729441
- When adding and removing bindings from the direct exchange using unique keys, for example when creating and deleting lots of uniquely named temporary queues, the bindings were not fully cleared up on removal, causing memory leaks. With this update, the bindings are now properly cleaned up when removed and the memory leaks no longer occur in the described scenario.
- BZ#798236
- Previously, when browsing the LVQ (Last Value Queue), messages delivered to the browsing session build up indefinitely while the session remained. Consequently, unbounded memory growth and memory leaks occurred. Now, acknowledgments are correctly sent by clients for browsed messages and the broker also handles those acknowledgments to be no longer holding the delivered message against the session. As a result, unbounded memory growth no longer occurs in a browser of an LVQ.
- BZ#769828
- The store has a flow-to-disk option, in which both transient and durable messages are saved to disk and the message content is freed temporarily from memory. Previously, when several consumers either browsed or consumed flowed-to-disk messages, a race condition could develop and cause unpredictable broker behavior. This update adds a locking mechanism to prevent inconsistent thread states. It is now safe to concurrently access messages with content flowed to disk from multiple threads.
- BZ#732369
- Previously, the
qpid-config --helpreturned ambiguous and confusing information regarding the host address option syntax. This update clears up the language in the help message to make it unambiguous, thus fixing this bug. - BZ#733241
- When queues were listed using the
qpid-config queues [queue_name]command, the qpid-config utility always returned0, regardless of whether the queue existed. Similar behavior was observed when listing exchanges. Consequently, scripts that use the return code of qpid-config to detect the presence of a queue or an exchange did not work correctly. This bug has been fixed and qpid-config now provides correct return codes in the described scenario. - BZ#758853
- Previously, code that checks the setting of command-line options failed to properly detect options set to the value
0, interpreting it as a missing value. Consequently, these options were ignored by the code, and set to their default value instead. This bug has been fixed and the value0in the command-line options is now properly recognized. - BZ#759114
- In the watchdog module, the path to the watchdog executable was hard-coded. Consequently, it was not possible to use the watchdog feature if the watchdog executable was not in its default location. This update adds the
--watchdog-execoption to the watchdog module to specify path to the qpidd_watchdog executable, thus fixing this bug. - BZ#693845
- When a Python client provided a user name and password while authenticating with the broker using the
ANONYMOUSmethod, the client incorrectly included the user name into each sent message. Consequently, the messages could not be passed to the broker. With this update, the user name is no longer placed into messages whenANONYMOUSauthentication is used, thus fixing this bug. - BZ#773700
- Previously, when the broker was restarted, QMF (QPID modeling framework) consoles sometimes became unresponsive. Consequently, Python-based QMF consoles failed to detect the broker was up again and were unable to monitor QMF agents. With this update, a timeout has been added to the connection clean-up logic, preventing the hang when a broker is restarted. Now, the console attempts to re-contact the broker periodically when a connection is lost. Once the broker is available, the console connects to it and resumes normal operation.
- BZ#704596
- Previously, the value of the
toString()function on the JMS destination of a message being sent defaulted to the BURL syntax. Consequently, the output oftoString()from a sender and a receiver side printed the JMS destination in different syntax, causing confusion. With this update, the code uses the default syntax of the client when creating the JMS destination from the incoming message and thetoString()output is now of the same syntax for both a sender and a receiver in the described scenario. - BZ#704608
- Previously, Red Hat Enterprise MRG only supported an integer value for the
JMSDeliveryModemessage header field (for exampleJMSDeliveryMode = 2). This violated the Java Message System (JMS) specification, which requiresJMSDeliveryModeto be used as a string in a selector (for exampleJMSDeliveryMode = ‘PERSISTENT’). This bug has been fixed, andJMSDeliveryModecan now be used as a string in the selector.Note
This update breaks backward compatibility. Users must change their code to use strings, not integers as before, to identify JMS delivery modes.
Enhancements
- BZ#751845
- Support in the broker for listening on SSL-encrypted sockets and plain TCP sockets is implemented by two distinct modules, each listening on its own port. Previously, it was not possible to serve both SSL and non-SSL connections from the same port. A deployment requiring both types must advertise two port numbers, which was not always possible or convenient. With this update, the SSL module has been changed to optionally serve plain (non-SSL) connections using the same port. Now, a single port can be advertised that will support both SSL and non-SSL traffic. This is enabled by setting the
--ssl-portand--portoptions to the same value.Note
Note that under this configuration, there is at present no support for IPv6 addresses. - BZ#727182
- This update adds support for DTX (distributed transactions) in clusters to the qpid-cpp package.
- BZ#734115
- Support for message grouping with strict sequence consumption across multiple consumers has been added to the qpid-cpp package.
- BZ#674379
- Support for the IPv6 protocol has been added to the
qpidC++ libraries for MRG Messaging. - BZ#650969
- Support for Microsoft Visual Studio 2010 has been added to the qpid-winsdk package.
- BZ#760636
- The QMF Broker method
querynow returns extra detail for a message group queue's internal state. - BZ#705418
- Previously, the Session class could only acknowledge a single specified message, or all outstanding messages. With this update, a method has been added to the API to cumulatively acknowledge all messages up to and including a given message in a single method call. This feature saves on round trip times to the broker.
- BZ#761186
- The JMS client now sets the
TCP_NODELAYproperty totrueby default as it shows an improvement in many general cases. If there is a configuration error, such as an error in connection URL, this property will still be set totrue.Note
Note that for high-throughput scenarios, it might be useful to turnTCP_NODELAYoff as with this setting, packet overhead is reduced and congestion collapse is prevented. For further details, see the section 3.2.2. Connection URLs in the programming guide.
service qpidd stop on all nodes, or run qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
1.12. RHSA-2011:1399 – Moderate: Red Hat Enterprise MRG Messaging 2.0 bug fix update
Release Notes
- Customers who are using the
qpid-cpp-server-xmlmodule must be subscribed to the following Red Hat Network channel in order to satisfy the XML dependencies for this release: Red Hat Enterprise Linux Server Optional. Customers who are using theqpid-cpp-server-clustermodule must be subscribed to the Red Hat Enterprise Linux High Availability channel in order to satisfy all dependencies. If you encounter yum dependency problems, ensure that you are subscribed to all of the required channels as detailed above.
Bug Fixes
- BZ#728586
- Due to a regression, a memory leak was introduced that prevented the broker from correctly releasing messages. Consequently, the broker's memory footprint grew indefinitely. A patch has been provided to address this issue, and the memory leak no longer occurs in the described scenario.
- BZ#734608
- After the upgrade of MRG Messaging from version 1.2 to version 1.3, the
qpidddaemon terminated unexpectedly on some of the cluster nodes. With this update, the broker code has been fixed to handle all cases when a faulty client sends frames before completely opening the connection, thus fixing this bug. - BZ#732063
- Previously, specifying an invalid IP address for a destination broker caused a repeatable file descriptor leak. A patch has been provided to address this issue, and the file descriptor leak no longer occurs in the described scenario.
- BZ#733543
- Prior to this update, when a large message (over 4KB) was sent from the python-qpid client to the broker, the connection became unresponsive and other clients were unable to connect to the broker. This bug has been fixed, and clients no longer hang in the described scenario.
- BZ#690107
- Under heavy load, the broker generated a large number of timer late/overrun warning messages. Though the messages themselves were usually inoffensive, the time to log them individually caused long (up to several seconds) delays and inflated log files. With this update, logging output for these messages has been restricted to the
--log- enabled=infooption, thus preventing this bug.
1.13. RHBA-2011:1340 – Red Hat Enterprise MRG Messaging 2.0 bug fix update
Release Note
- Customers who are using the qpid-cpp-server-xml module must be subscribed to the following Red Hat Network channel in order to satisfy the XML dependencies for this release: RHEL Server Optional.Customers who are using the qpid-cpp-server-cluster module must be subscribed to the RHEL High Availability channel in order to satisfy all dependencies.If you encounter yum dependency problems, ensure that you are subscribed to all of the required channels as detailed above.
Bug Fixes
- BZ#737177
- The system threw a javax.naming.NameNotFoundException if the JNDI file definition in the executed program defined a JNDI file with a syntax error. However, this is not appropriate as the program cannot access the JNDI file and the execution was therefore interrupted. Such internal exceptions are now re-thrown as ConfigurationExceptions with the message "Failed to parse JNDI properties file" and the program is interrupted.
- BZ#738352
- The broker ignored QMFv2 requests sent from a Java client because the requests did not have the app-id set. The requests are now sent with the app-id and the messages are processed as expected.
- BZ#738354
- A queue bound to the default exchange dropped messages with a subject. This happened because the routing key was wrongly using the message subject as the queue name. The underlying code has been changed and such messages are now delivered to the queue.
- BZ#738357
- When a connection was created with newly specified credentials, the Java client cached the new credentials and overwrote the default credentials provided by the connection URL. As a result, when a new connection was created without credentials, the connection used the cached credentials and the authentication could fail. The credentials specified at a connection creation are no longer cached and the default credentials for connections without credentials are used as expected.
- BZ#738358
- In a clustered environment, the system threw a SessionException when a session was committed after a failover using the
session.commit()method. Because the client application code did not receive a standard JMSException, it could fail to recover. The system now throws a JMSException under these circumstances and the client recovers as expected. - BZ#738360
- The system threw a SessionException if a session was committed after a transaction had exceeded the queue limit (max-queue-size). This caused the client to fail to handle the error. The system now throws a JMSException under these circumstances and client can recover with the JMSException as expected.
- BZ#738361
- The
Connection.getJMSXPropertyNames()method returned a usable enumeration only when called for the first time. If the user called thegetJMSXPropertyNames()method multiple times, the method returned empty enumerations. This happened because the system overwrote the first created enumeration with the subsequent call ofgetJMSXProperyNames(). The underlying code has been changed andgetJMSXPropertyNames()returns the Enumeration value with the property names as expected. - BZ#738362
- According to the JMS specification, concurrently executing clients cannot use the same client ID. Previously, it was possible to allow multiple connections to use the same client ID. With this update, it is no longer possible to define such clients. Note that you need to enable the verify client ID feature to prevent the clients from having identical IDs (-Dqpid.verify_client_id=true).
1.14. RHEA-2011:0892 – Red Hat Enterprise MRG – Messaging 2.0 Release
Chapter 2. MRG Messaging on Red Hat Enterprise Linux 5
- 2.1. RHBA-2015:20981 – Red Hat Enterprise MRG Messaging 2.5 bugfix update
- 2.2. RHSA-2015:19853 – Moderate: qpid-cpp security and bug fix update
- 2.3. RHBA-2014:19325 – Red Hat Enterprise MRG Messaging and Grid Bugfix Update
- 2.4. RHBA-2014:18790 – Red Hat Enterprise MRG Messaging 2.5.5 bugfix update
- 2.5. RHBA-2014:17845 – Red Hat Enterprise MRG Messaging 2 bugfix update
- 2.6. RHBA-2014:0129 – Red Hat Enterprise MRG Messaging 2 bug fix update
- 2.7. RHBA-2013:1755 – Red Hat Enterprise MRG Messaging 2.3.6 bugfix update
- 2.8. RHBA-2013:1023 – Red Hat Enterprise MRG Messaging 2.3.3 bugfix update
- 2.9. RHSA-2013:0561 - Moderate: Red Hat Enterprise MRG Messaging 2.3 security, bug fix, and enhancement update
- 2.10. RHSA-2012:1277 – Moderate: Red Hat Enterprise MRG Messaging 2.2 security, bug fix, and enhancement update
- 2.11. RHSA-2012:0529 – Moderate: Red Hat Enterprise MRG Messaging 2.1 security, bug fix and enhancement update
- 2.12. RHBA-2011:1393 – Moderate: Red Hat Enterprise MRG Messaging 2.0 bug fix update
- 2.13. RHBA-2011-1339 – Red Hat Enterprise MRG Messaging 2.0 bug fix update
- 2.14. RHEA-2011:0890 – Red Hat Enterprise MRG – Messaging 2.0 Release
2.1. RHBA-2015:20981 – Red Hat Enterprise MRG Messaging 2.5 bugfix update
Bug Fixes
- BZ#1219554
- A race condition was discovered on qpid::management::ManagementAgent::debugSnapshot when one thread executed periodic processing with traces enabled, while the second thread processed a QMF method from the client. The race condition caused a segfault. The fix improves thread iteration behavior, which fixes the originally reported issue.
- BZ#1223730
- If a priority queue contained messages with differing priorities, rerouting the queue contents to a non-priority destination queue caused the messages to be ordered from lowest to highest, instead of highest to lowest. A fix to the MoveMessages::removeIf and PriorityQueue::removeIf changes the message traversal order to highest-to-lowest, which ensures that any rerouted messages with priority assigned are processed first in non-priority destination queues.
2.2. RHSA-2015:19853 – Moderate: qpid-cpp security and bug fix update
Security Fixes
- CVE-2015-0223
- It was discovered that the Qpid daemon (qpidd) did not restrict access to anonymous users when the ANONYMOUS mechanism was disallowed.
- CVE-2015-0224
- A flaw was found in the way the Qpid daemon (qpidd) processed certain protocol sequences. An unauthenticated attacker able to send a specially crafted protocol sequence set could use this flaw to crash qpidd.
Bug Fixes
- BZ#1142833
- Prior to this update, because message purging was performed on a timer thread, large purge events could have caused all other timer tasks to be delayed. Because heartbeats were also driven by a timer on this thread, this could have resulted in clients timing out because they were not receiving heartbeats. The fix moves expired message purging from the timer thread to a worker thread, which allow long-running expired message purges to not affect timer tasks such as the heartbeat timer.
- BZ#1197134
- Because of changes in the included security fixes, starting with this release Windows client programs that use the default PLAIN authentication may now fail if authentication is turned off on the broker, where before they erroneously succeeded. To work around this, explicitly use ANONYMOUS by using
--connection-options '{sasl_mechanisms:ANONYMOUS}'or you can use a valid username (to work with the default PLAIN mechanism if you have it set up). For example, with valid user 'jsmith', it would be--connection-options '{username:jsmith}'.
2.3. RHBA-2014:19325 – Red Hat Enterprise MRG Messaging and Grid Bugfix Update
service qpidd stop on all nodes, or qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
2.4. RHBA-2014:18790 – Red Hat Enterprise MRG Messaging 2.5.5 bugfix update
service qpidd stop on all nodes, or qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
Bug Fixes
- BZ#860691
- Poor enqueueing performance was discovered when a priority queue with a ring queue policy had a large message backlog. Performance degraded quickly when messages of mixed priority were inserted into a queue with a large (more than 100,000) message backlog. The fix ensures disordered messages acquired by a consumer as part of a cluster update, but not yet acknowledged at the time a new member joins the cluster, are reordered. During normal operation, new messages are appended directly to the back of the priority queues because the order of arrival is known. Enqueueing performance is improved for priority ring queues containing large numbers of messages with mixed priorities.
- BZ#1101644
- It was discovered that when the python framer used an SSL connection, it consumed an extra byte past the frame header (the ninth byte), rather than only consuming the first eight bytes (frame header). This caused the ninth byte to be dropped, and resulted in a connection failure on certain systems. Header parsing with SSL is now fixed to only consume the frame header, which allows the connection to be established correctly on all systems.
- BZ#1121615
- Previously, a gap in time existed between a socket opening and the mechanism for communicating about the connection starting. If the socket was disconnected within this gap there was no way to signal this event back to the federation code, which initiated the socket open request. This resulted in the socket remaining in the connected state. The connection communication mechanism is now started as soon as the socket connection is requested, which allows the federation code to be aware of any socket state changes.
- BZ#1128654
- It was discovered that calling Session::nextReceiver() with a duration of IMMEDIATE always returned null, even if there were incoming messages waiting. This bug, introduced as part of the fix for bug 958895 in the 2.3.3 release, resulted in incoming messages never being received. The fix allows for messages to be returned even if there is no waiting period. Messages are now properly delivered.
- BZ#1132395
- It was discovered that when sending messages over a federation link with no acking, the message delivery records were not properly marked when the message was delivered. This caused the broker memory footprint to increase unnecessarily with each message. The fix ensures the DeliveryRecord is correctly marked when the message is delivered, allowing the memory for the message to be released. The memory footprint of the broker can now decrease when messages are delivered.
- BZ#1136294
- It was discovered that when messages were purged from a priority ring queue, the message counters were not properly maintained. This caused messages to be deleted incorrectly, which caused memory to be consumed unnecessarily. The message counters are now properly maintained for this particular scenario. Messages are purged properly, and memory no longer accumulates during repeated purging operations.
- BZ#1142866
- It was discovered that a recently fixed portion of debug code, which was implemented to avoid a segfault, was executing even if debug mode was not enabled. This caused purging expired messages from a queue to take longer than it did prior to the fix. The code was reworked so that the debug code is only executed when debug mode is enabled. Purging messages now happens at the same speed it did before the erroneous change.
- BZ#1147968
- It was discovered that if Session.nextReceiver() with IMMEDIATE duration was called while the connection to the broker was being lost (due to a network issue, for example) , the method could fail to raise the expected MessagingException. This could cause a client application to not be notified that the connection has shut down, so the application can fail to reconnect. Session.nextReceiver now reliably returns the desired exception if the connection is closed. The client application gets the exception and can take appropriate action.
2.5. RHBA-2014:17845 – Red Hat Enterprise MRG Messaging 2 bugfix update
service qpidd stop on all nodes, or qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
Bug Fixes
- BZ#1079260
- A bug in the code caused statistics for Last Value Queues to be misreported by the Qpid Management Framework (QMF). The code was modified, and QMF now correctly reports the statistics for LVQ queues.
- BZ#1088004
- Previously, the python client decode routines expected a multi-frame command to be received in sequence, with no interleaving. It was discovered that due to the python client's expected operation, some valid AMQP sequences with interleaved heartbeats sent on different protocol tracks caused a decode error in the python client library. The fix introduces modified decode routines in the python client that take into account multiple protocol tracks. This corrects the decode errors originally encountered.
- BZ#1094781
- Previously, a gap in time existed between a socket opening and the mechanism for communicating about the connection starting. If the socket was disconnected within this gap there was no way to signal this event back to the federation code, which initiated the socket open request. This resulted in the socket remaining in the connected state. The connection communication mechanism is now started as soon as the socket connection is requested, which allows the federation code to be aware of any socket state changes.
- BZ#1098314
- Previously, a queue with the same name as an exchange made the exchange inaccessible to the python client. With this release, the python client can now reliably send to either a queue or exchange when they share the same name. The queue is preferred by default, and the exchange is indicated by specifying that the node type is topic, like so:
spout 'foo; {node:{type:topic}}'. - BZ#1103094
- It was discovered that a concurrent bind and delete on a queue could leave a binding in place, even though the queue was no longer accessible. Additionally, the queue lock was held when cancelling the auto-delete timer task. These factors caused memory usage to climb, and holding the lock while cancelling a timer could also result in a deadlock. Threads are now coordinated to ensure a concurrent bind and delete removes all bindings when a queue reaches the deleted state, and the timer task is cancelled outside the lock scope. This ensures no deadlocks or memory leaks when attempting to re-bind a queue that is being auto-deleted by a timer.
2.6. RHBA-2014:0129 – Red Hat Enterprise MRG Messaging 2 bug fix update
Bug Fixes
- BZ#966004
- The version of Python included in Red Hat Enterprise Linux 5 does not support certificate verification. Should the application attempt to use certificate verification - by supplying a CA certificate file - the
python-qpidcode would attempt to issue an error exception. The exception it attempted to issue wasSSLError, which is also not defined by the version of Python in Red Hat Enterprise Linux 5. This caused a misleadingNameErrorto be thrown.Theqpid-pythoncode was modified to avoid using theSSLErrorexception, and now uses a pre-definedsocket.errorexception when a client attempts the unsupported certificate verification operation.
2.7. RHBA-2013:1755 – Red Hat Enterprise MRG Messaging 2.3.6 bugfix update
- BZ#966636
- Previously, presenting the broker with a malformed certificate during client connection caused a segfault. In this release, the broker does not segfault and rejects the invalid certificate.
- BZ#990240
- Previously, if a client dequeued a message in a transaction, and then attempted to dequeue the same message in another transaction, the broker would segfault. In this release, the broker does not segfault when a message is transactionally dequeued more than once.
- BZ#990542
- Previously, in a two-node cluster with authentication enabled, a node with a dirty store would stall on restart when the other node had a clean store. In this release, a cluster node with a dirty store restarts as expected in a two-node cluster with authentication enabled.
- BZ#998450
- Auto-delete queues were not being accounted for correctly when tracking queue quotas for users. Creating an auto-delete queue would use up a user's queue creation quota, even after the queue was deleted. When auto-delete queues are deleted, the queue count for the user is now decremented. User quotas for queues are now maintained correctly, even if they use auto-delete queues.
- BZ#1005699
- Previously, when a message larger than the size of the journal was enqueued, the message enqueue failed, but no error was returned to the broker. The broker incorrectly assumed that the message was enqueued. An exception is now thrown when a message larger than the journal size is enqueued, and the broker knows the operation failed.
- BZ#1007398
- Previously, a race condition could occur on an LVQ queue. When an LVQ message was retrieved by a consumer, then replaced on the LVQ queue on the broker with a later message with the same LVQ key, acknowledging the retrieved message threw an exception
JERR_WMGR_DEQRIDNOTENQ: Dequeue rid is not enqueued on LVQ. The first message, which was retrieved by the consumer, was dequeued on the broker when it was replaced by the later message. In this release, LVQ messages are not dequeued unless they are deleted. Consumers can reliably acknowledge LVQ messages, even if they have been superseded in the queue since they were retrieved.
2.8. RHBA-2013:1023 – Red Hat Enterprise MRG Messaging 2.3.3 bugfix update
Bug Fixes
- BZ#847038
- Parallel federation connections that used the same IP address and port did not recover on cluster failover. The cluster used the endpoint IP address and port number to identify specific Federation Links on the broker. When two Links were created to the same destination, they shared the same IP address and port. This prevented the cluster from distinguishing between the two links. Both connections were interpreted as a single link. The cluster recovery code now uses the link
nameparameter to uniquely identify each connection. The Links are now correctly restored should a failover event occur. - BZ#868951
- There was a race condition when a new broker joined a cluster. The link failover session could be given a channel that conflicted with a bridge session on another broker. This sometimes caused a cluster member to exit with a "
session.attached: incorrect session name" error. Channel 0 is now reserved for the failover session, and the cluster functions as expected. - BZ#923452
- The broker is informed by the client periodically which messages have been received, allowing the credit window to be moved forward. To do this it tracks how much of the current credit window has been used. However, if a fetch timed out this information was cleared. This could prevent the window ever being moved forward. A receiver could stop receiving messages even though there were messages available on the queue. The broker is now notified of which messages have been received whenever a fetch times out. The credit window is now always moved forward correctly and messages continue to be received when available.
- BZ#948489
- A broker may be configured to replicate queue events. On a large, busy system the replicated queue may generate many messages. If the replication queue consumers stop consuming messages then the broker is at risk of running out of memory as it fills with messages waiting for replication. A limit may be established that sets an upper limit on the number of messages in the replication event queue, using the parameter
--replication-queue-panic-message-limit. If that limit is exceeded then replication is stopped and the event queue is flushed to recover the memory and message resources that the queue was using. This fix is for exceptional conditions only. It gives the option to continue without replication instead of having the broker overload with messages that are not being consumed. The broker loses the ability to replicate messages as replication is stopped. However, the broker continues to run and avoids an out-of-memory condition. - BZ#951296
- In MRG 2.3, the credit limit on Federation links was set to twice the
--ackvalue. This caused performance problems with async replication. Async replication performance was poor unless the--ackvalue was set very high. With this release the credit on Federation links is returned to unlimited, as it was before MRG 2.3. Async replication performance is now restored to pre-2.3 levels, even with low--ackvalues. - BZ#951719
- ACL incorrectly denied a queue creation when the user did not specify an upper limit in the declaration request. The creation logic was adjusted to accept queue requests where the upper limit is not specified. The requests are now allowed.
- BZ#952750
Message.setJMSMessageID()was not compatible with ActiveMQ MessageID. The message implementation did not allow a non-UUID string to be set as the message ID. As a result, when a Qpid jms message was sent via a 3rd party message provider, setting themessage-idthrew an exception if it was not a valid UUID. You can now set any String as the message ID when sending a Qpid JMS message through a 3rd party sender.- BZ#953777
- A previous change made it so that a broker with a clean data store could not join an existing cluster. The previous change was modified so that a broker with a clean data store can now join the cluster.
- BZ#956240
- A previous change in the qpidd broker caused management updates to not be sent when the statistics for exchanges were changed. Since
qpid-toolis a passive tool (it does not send requests, it simply listens passively for updates), it did not receive or display updated statistics for exchanges. Note that "qpid-stat -e", which provides the same statistics, was not affected becauseqpid-statis an active tool that makes a single request for the latest statistics. The defect in the broker was fixed, and now updates are sent when statistics are changed.qpid-toolagain shows updating statistics on exchanges as those exchanges operate and update their statistics. - BZ#956764
- The
AMQDestination.equalsmethod did not take into account the differences between the ADDR and BURL syntax. As a result, the legacy fields were not populated until the address was validated. This resulted in a Null Pointer Exception in some circumstances.equalsand hashcode are now handled separately for ADDR and BURL, and behave as expected. - BZ#957869
- ACL delete rules can now be set on queues and exchanges based on their exclusivity, durability, and other properties.
- BZ#958895
- Incoming messages are held in a session level queue to ensure ordering. Any thread calling
Receiver::fetch()processes messages on this queue and blocks on it if there are none for that particular receiver. Messages for other receivers are put in a separate holding queue. If there was more than one thread concurrently processing the session queue, then a message of interest to one thread might be processed by another thread, preventing the correct thread from seeing it. Now, only one thread at a time can process the incoming queue. Concurrentfetch()calls now work as expected without needing to timeout to see a message as it arrives.
2.9. RHSA-2013:0561 - Moderate: Red Hat Enterprise MRG Messaging 2.3 security, bug fix, and enhancement update
python-qpid
- BZ#856299
- Some message send operations paused for longer than the specificed timeout value. The timeout value was not passed to the sync() call in all situations. In this release the specified message timeout value is honored. All message send operations now return within the timeout value specified.
- BZ#850517
- A previous code modification did not account for older versions of Python's SSL implementation, causing various qpid tools using SSL to fail on Red Hat Enterprise Linux 5. In this release qpid tools using SSL on Red Hat Enterprise Linux 5 function correctly.
- BZ#782806
- SSL/TLS support is added to the messaging API. This allows clients using the messaging API to connect to the broker over a connection encrypted using SSL/TLS.
qpid-cpp
- BZ#773719
- The list of bindings on a queue was insufficiently protected from concurrent access. If multiple clients connected to the broker were adding and deleting bindings on the same queue, the broker might crash. In this release the broker provides adequate protection for this data structure. The broker does not crash with multiple clients manipulating the bindings on the same queue.
- BZ#877081
- Improper message header record locking could cause a broker crash. When messages were shared among many queues and subsequently rerouted, header processing by one thread could invalidate header processing by another thread. In this release message header records are effectively locked to prevent concurrent changes. Message header records may now be processed by any number of queues without causing a broker crash.
- BZ#734883
- Under certain conditions, the store could think the journal was full when it was not. In that case the store could not be recovered and the broker would not start. In this release the logic error is fixed. The store no longer thinks it is full when it is not, and the broker recovers normally.
- BZ#737685
- A defect in the qpid broker caused messages that were delivered and acquired by a consumer but not yet acknowledged to not be delivered to the alternate exchange when the queue was deleted. Unacknowledged messages could be lost if they were expected to be delivered to the alternate exchange. In this release all unacknowledged messages in a queue are now delivered to the alternate exchange regardless of whether or not they have been acquired by a consumer.
- BZ#876664
- If a queue creation failed because of a bad property value then artifacts of the failed creation were not cleaned up. If the queue was later created correctly then an error appeared 'two management objects with the same identifier'. In this release when a queue creation fails because of bad property values all artifacts related to the failed queue are deleted. There is no possibility for management objects to have identifier conflicts and there is no error message.
- BZ#871774
- The algorithm for browsing messages changed, but the message group code was not updated correctly. A message acquired by a client would still remain visible to browsing clients. In this release the message group code is updated to the new algorithm. Acquired group messages are no longer visible to other browsing clients.
- BZ#868403
- Federated link Id numbers are assigned sequentially from a list of 64k candidates. After 64k link numbers have been assigned then the number sequence starts again at zero and link Id number collisions can occur. Federated links received errors when trying to create sessions using duplicated Id numbers. In this release a list of in-use link Id numbers is maintained and used to avoid duplicate conflicts. Federated links may be created and destroyed indefinitely without receiving errors.
- BZ#801605
- Federated links could issue cluster events prior to the link connection being fully established (protocol handshake complete). Cluster members received events for unknown federated link connection, which resulted in the members leaving the cluster. In this release federated link IO processing is delayed until after the connection is fully established. Cluster members do not leave the cluster when federated to a non-responsive peer.
- BZ#720714
- The mechanism for generating Link and Bridge names was incorrect and could generate the same name for multiple distinct Links and Bridges. The broker's internal federation configuration would become corrupt. This resulted in inconsistencies among clustered brokers, forcing one or more brokers to exit. In this release a new Bridge and Link naming algorithm guarantees unique names for all Bridge and Link objects. Clustered brokers running federation now remain consistent.
- BZ#710787
- The loss of connection always resulted in at least one attempt to reconnect in the event that a client lost its connection to the broker, even when the reconnect option was disabled. In this release the logic handling the loss of a connection is altered to distinguish it from opening a connection in the first place. The reconnect option is now correctly handled; specifically if set to false there will never be a reconnect attempt made.
- BZ#703170
- Multiple links can now be configured between federated brokers, allowing for greater message throughput.
- BZ#849790
- Acl files may be reloaded at any time. However if there is an error in the Acl file then the broker halts. There was no way to load a trial Acl file and see how it behaves; only live Acl rule files could be tested. This release introduces the ability to test an ACL file on an off-line broker. The Acl file may be repeatedly loaded and tested without interrupting service on a mission-critical broker. When the Acl file is finally tested it can be loaded into the live broker. See the Messaging Programming Reference Guide for details.
- BZ#849788
- Specifying Acl rules that allow named users to create named objects required a large number of specific Acl rules. Administrators had to keep adding users to the Acl file to allow new users to use the broker. In this release user name substitution keywords are added to the Acl file so that a single rule may apply to all users. Keywords are created to substitute for the user name, the domain name, or the user and the domain name together. Keyword substitution is allowed for object names, routing key names, alternate exchange names, and queue names. Refer to the Messaging Installation and Configuration Guide for details.
- BZ#784685
- Log messages did not show enough detail to audit who created and deleted broker objects. Log messages had no high level filter keys users could specify to help filtering log messages. This release contains a new log category to track creation and deletion of manageable objects in enough detail to track who change the configuration of the broker. Refer to the Messaging Installation and Configuration Guide for details.
- BZ#836141
- An error in the clustering logic caused cluster nodes to fall out of sync with the master node during federation. In this release the logic is fixed, and cluster nodes remain synchronised. Further details follow: The eldest cluster broker sends commands Creates its federation link and bridge, But leaves behind its juniors, all confused. Not privy to these mysteries, they soon Fall out of sync, and drag the cluster down. Without agreement, how can clusters run? If elder and its juniors disagree, Divide against itself this fragile house, How can it stand? How can the perfect synchrony exist? It can't. And therein lies the rub. The elder broker builds its links and bridge, But tells the youngsters: "Do thou as I do!" They too can build that bridge, maintaining it Against the day when elder fails. All their Commands are thus maintained in sync. The cluster hums, the traffic leaps and flows. Where once was strife and error, harmony has come. The synchrony will no doubt be disturbed again. But not by this BZ.
- BZ#835119
- The SSL transport implementation for both C++ broker and C++ client was missing some of the implementation. This meant that even if the broker or client detected heartbeat failure it could not abort the failed connection. So heartbeats did not work over SSL. In this release the missing pieces of the SSL implementation are present for SSL for the broker and client. Heartbeat failure now correctly aborts SSL connections on both broker and client.
- BZ#572567
- Users were not restricted in the number of queues they may declare. Any user could run the broker out of resources by creating too many queues. This release adds a command line argument "--max-queues-per-user N" that is enforced for all users. When this option is specified each user may have at most N queues concurrently, and broker resources are preserved.
- BZ#861838
- Dynamic bridges were improperly destroyed after a binding error. This often occured during a broker restart when resources were being recreated sequentially, and bindings could not succeed until all the resources had been recreated. This caused configured bridges to be lost instead of being recovered during a maintenance cycle.In this release dynamic bridges are not deleted after a binding error. After required resources have been restored the dynamic bridge is created properly during a periodic retry.
- BZ#799479
- Non-privileged users could specify any queue size and stress the broker's resources. This release provides quota limits for message queues, in the form of upper and lower limits for memory usage and for queue message counts. User requests to create queues that are too large or too small, both in-memory and on-disk, are denied.
- BZ#754990
- The ring policy enforcement logic did not take account of the case where no maximum size was specified, only a maximum count. A maximum size of 0 (i.e. unspecified) would result in a message being needlessly removed. In this release the logic is updated to correctly handle the case where the size is not limited but the count is. The policy is correctly enforced (and the behaviour is the same regardless of the default-queue-limit set).
- BZ#866677
- A previous release changed the logging of expired message. Messages expired by the queue cleaner are not logged as they were prior to the change. In this release logging is moved to a common place in the source module so all expired messages receive the same logging treatment. The queue name and message properties are included in debug-level log entries.
- BZ#802656
- This release adds topic key matching to the Acl file routing keys that uses the same logic as the broker's routing key matching. Acl rules can now use the same routing key specification syntax that is used by the broker's run-time topic key matching logic.
- BZ#804752
- Client exceptions were not properly cleared when a transport exception is triggered by a broker failure. Client code got stuck in an infinite loop reprocessing old exceptions. In this release when the broker exits unexpectedly, the transport channel is closed and the transport exception is cleared. The client recovers and creates a new connection when the broker returns to service.
- BZ#884036
- Attempts to cast negative-zero to an unsigned type would fail due to a bug in older GCC libraries. In this release the casting logic is rearranged to handle the "minus" sign separately. Negative-zero now works properly on older and newer GCC libs. Other values are unaffected.
- BZ#870058
- Running the
qpidd --configcommand passing an existing directory would hang and not return to the command line. In this releaseqpidd --configdetects when an existing directory is passed and, if so, returns an error message. - BZ#812376
- Bounds checking on fields whose AMQP types restrict their length was not done at a point that the application can be notified. The IO thread caught the problem but at a late stage. Over large values caused the IO thread to spin indefinitely on the error. In this release bounds checking is done when the (potentially) invalid value is set. An error is raised to the application if it sets too large a value.
- BZ#813742
- The async queue replication code set frame flags incorrectly when there was an empty content frame (i.e. a content frame with content size equal to zero). This was treated as not having a content frame at all so the end-of-frameset flag was incorrectly set on the header frame. Such messages could not be replicated; the receiving broker would reject them as invalid AMQP. In this release the end-of-frameset flag setting takes account of whether there is a content frame, even if it is empty. Such messages are sent as valid AMQP frames and replicate as expected.
- BZ#814356
- The broker required create permission for user sessions performing a passive declare (i.e. a declare that should not create, but should merely verify existence). This resulted in clients that were not actually trying to create an exchange receiving permission errors due to the lack of create permission. In this release if the declare is passive, the permission required is changed to access rather than create. Clients using but never creating exchanges need not have create permission granted.
- BZ#816092
- In queue state replication, if duplicate messages were detected then the replication exchange did not increase its counters (
msgDropandbyteDrop). This caused invalid counters of replication exchange. In this release when message duplicity is detected, relevant counters of the replication exchange are incremented, resulting in correct counters of replication exchange. - BZ#740505
- When command-line options contained errors, the
--helpand--versionoptions were not processed. With an error in user specified command-line options the presence of the--helpoption caused a message "Use --help to see valid options" but did not display the help screen. In this release the--helpand--versionswitches are parsed separately so they act despite other errors in the command-line options. When a parse error happens and the user has specified --help then the help usage text is displayed. - BZ#873414
- An updated default ACL file is provided at
/etc/qpid/qpidd.acl. New installations use this file by default. Existing installations that upgrade via rpm can find the file as/etc/qpid/qpidd.acl.rpmnew. - BZ#834256
- The alternate-exchange option in an x-declare clause within the link-options of an address was being ignored. The alternate exchange property could not be controlled via the link options. In this release the alternate exchange option is checked and used if specified. The alternate-exchange can now be controlled on queues created through the link options in an address.
- BZ#678612
- Some qpidd log messages were output before the PID when running
qpidd --check. Withlog-to-stdout=yes,qpidd --quitdid not work. In this release the unnecessary log messages are removed, andqpidd --quitworks as expected. - BZ#868881
JournalInactivemessages appeared when certain broker scheduling algorithms were triggered and the broker is running at INFO log level. There was nothing a customer could do to stop the scheduling triggers and there was no convenient way to turn theJournalInactivemessages off. In this release the priority of the JournalInactive log messages is lowered to DEBUG level and the message frequency is lowered from 12/second to 1/second. If the broker is running at INFO log level theJournalInactivemessages may be selectively re-enabled by using switch:--log-enable debug+:Timer.- BZ#783215
- When a reroute operation was requested on a queue and the flag was specified to use the alternate exchange of that queue, there was no check whether an alternate exchange has actually been specified for the queue. In such a case the rerouted messages were silently dropped. In this release a check is made to ensure that when requested to use a queue's alternate exchange, that exchange has been specified. If a reroute request is made and the alternate exchange is to be used as the target and the queue did not have an alternate exchange, an error is raised.
- BZ#783243
- The mechanism for generating Link and Bridge names was incorrect and could generate the same name for multiple distinct Links and Bridges. In that case the broker's internal federation configuration would become corrupt and federation would fail to properly forward messages between brokers. This release fixes the Bridge and Link naming algorithm to guarantee unique names for all Bridge and Link objects. Now, federation forwards messages as expected.
- BZ#846465
- In a previous release GSSAPI authentication was removed from the default list of mechanisms. In this release GSSAPI is added to the default list of sasl mechs in
/etc/sasl2/qpidd.conf. The Installation and Configuration Guide has instructions on configuring GSSAPI authentication with Kerberos. - BZ#849557
- The first broker was stopping in the INIT state without having put its URL into the ClusterMap membership list. Since the first broker did not put its URL in the list of cluster members, the second broker cannot register its own URL in the Credentials Exchange of the first. So when the first broker receives its update request, it cannot be authenticated. So the first broker raises an error, and both are stalled. In this release, before the first broker starts waiting for the others to get a quorum, make it call
Cluster::ready(), which will put its URL on the cluster membership list. It is ready - it is only waiting for the others to come online and get quorum. The second broker can now authenticate to the first, and get its update. The two brokers do not deadlock. - BZ#783482
- Queues can now be declared as browse-only.
- BZ#783428
- Queues could be created with
max_sizeandmax_countvalues being unreasonably small. ACL parameters to prevent these values from being so small were absent. Users were able to declare queues that will never work since the maximum size and count values are too small. In this release ACL create queue parameters are added to set lower limits on size and message count. These settings prevent the queuemax_sizeandmax_countvalues from being too small. - BZ#784957
- Acl processing logic discarded certain rules in a premature optimization. Some sequences of rules would not work. In this release Acl rule processing keeps all Acl rules and process them in the order given. Allow and Deny rules may be intermixed and give the expected allow or deny result.
- BZ#790004
- Interfaces with IPv6 addresses were not included in the list of interfaces included when automatically calculating a broker's default urls. Interfaces with IPv6 addresses are now included in the automatically generated list and IPv6 addresses are included in the default broker url.
qpid-java
- BZ#785209
- The store serializes transactions. Java client is not well optimized for speed and performance. A Java client running against a broker which uses durable and transactional messages was very slow. In this release the Java client is optimized to improve speed. The store is not changed.
- BZ#805881
- The JMS client can now send and receive messages encoded as AMQP lists. On the receiver side, List messages are exposed via 3 interfaces. 1. javax.jms.StreamMessage 2. javax.jms.MapMessage 3. org.apache.qpid.jms.ListMessage On the sender side, List messages can be sent in two ways. 1. org.apache.qpid.jms.ListMessage -- by creating it via createListMessage() in org.apache.qpid.jms.Sesson. Ex ListMessage msg = ((org.apache.qpid.jms.Sesson)ssn).createListMessage(); 2. If you set -Dqpid.use_legacy_stream_message=false any stream message you create will be encoded as a list message. Ex StreamMessage msg = jmsSession.createStreamMessage(); From next release onwards we will switch the default, so you don't need to explictly set -Dqpid.use_legacy_stream_message=false We recommend the second option as sticking to the standard JMS interfaces will allow your code to be more portable. This allows the JMS client to send and receive messages encoded as AMQP 0-10 lists. This further enables the use of QMF while using standard JMS interfaces.
- BZ#860011
x-bindingswere not evaluated as defined the address spec. For producersx-bindingswere not created at all. Link bindings were not created/deleted for both producers and consumers as per the address spec. In this releasex-bindingsare now evaluated as defined by the addressing spec.x-bindingscan now be created for both producers and consumers for both node and links.- BZ#882243
- Messages sent under an XA transaction were replayed on failover. Consequently, transaction atomicity was lost. In this release such messages are no longer replicated, and transaction atomicity guarantees are honoured.
- BZ#876193
- Incorrect error handling caused applications to not be notified that an exchange of a different type with the same name exists when creating an exchange. In this release an error is thrown when creating an exchange if one already exists with the same name.
- BZ#893980
- When a producer tried to delete a non existent subscription queue a session error occurred. This session error does not get picked up by the client while it's waiting for completion. The client appears to hang. Eventually the wait times out. In this release a check is added so that a producer does not attempt to delete a non-existent subscription queue.
- BZ#824917
- A logic error meant that the durable property in addresses was ignored and exchanges created using an address were not marked durable. In this release the durable flag is set based on the durable property specified in the address. The exchange created is marked as durable if the durable property is set to true in the address.
qpid-jca
- BZ#888392
- The QpidConnectionFactoryProxy only implemented the ConnectionFactory JMS 1.1 API. In this release the proxy implements the Queue/TopicConnectionFactory portions of the API (JMS 1.0).
qpid-qmf
- BZ#877553
- A rarely used internal data structure was being modified by a thread without being locked to other threads. This would result in the data structure getting corrupted and eventually causing the application to crash. In this release a lock was introduced to allow the data structure to be modified safely. The internal data structure is no longer corrupted.
- BZ#845223
- In this release the QMF broker events
clientConnect,clientDisconnect, andclientConnectFailwhere extended to include the Client Properties Map. This map is defined by AMQP-0.10 as a map of identifying attributes as provided by the client. QPID clients include their process id, parent process id, and command name in this map. This information allows detailed tracking for auditing and debugging purposes. - BZ#781496
- Messages were removed from the Queue when acquired by a client. When querying for the message timestamp, acquired messages were no longer available and the timestamp could not be retrieved. A timestamp of zero was returned. In this release acquired messages remain on the Queue, and are available for querying the timestamp. A valid timestamp is now returned for messages that have been acquired.
qpid-tests
- BZ#800912
- The qpid-perftest program had a logic error which caused it to wait for the incorrect number of control messages. Running
qpid-perftestwith--npubs<--nsubsand--iterations> 0 fails to finish, and appears to hang. In this release theqpid-perftestprogram now handles--npubs<--nsubsand--iterations> 0 without appearing to hang, and finishes normally.
qpid-tools
- BZ#864933
- The command-line tools
qpid-configandqpid-stathave changes to their command-line options.-rnow lists bindings recursively, and-bspecifies the broker address. For backward compatibility-acan still be used for the broker address. Refer to the Installation and Configuration Guide for details. - BZ#795324
- Client connection data was stored in wrong place in the code of
qpid-clustertool. As a result the tool did not display client connection information. In this release the tool displays client connection information. - BZ#740485
- Running
qpid-statwith a remote host address to view info on remote queues displayed the info for the localhost queue. In this releaseqpid-statconnects to the remote host address when a remote host address is specified, and the info for the remote queues is shown. - BZ#770711
qpid-clusterignored broker security credentials. Consequently the tool was unable to connect to any broker that had authorization turned on. In this release the QMF code pays attention to the credentials, if present, andqpid-clustercan connect to brokers with auth turned on.- BZ#895535
- Several QPID tools support the '
--ssl-key' parameter. This parameter is used to specify a file which contains the private key used to sign the certificate that identifies the tool's user with the broker. A coding problem caused the QPID tool to ignore the value supplied to the '--ssl-key' parameter. The broker against which the tool was run would not be able to authenticate the user of the tool. This would cause the broker to reject the connection attempt, and the tool's operation would fail. In this release the code is corrected to use the value passed via the '--ssl-key' parameter when providing authentication criteria to the broker. Consequently, the tool's user is authenticated by the broker, and the QPID tool's operation is permitted. - BZ#805599
- The cluster column was missing in
qpid-statoutput. This release fixes the output to include the broker's cluster.qpid-stat -g -b <broker>now shows the cluster name or<standalone>. - BZ#786555
- Using
qpid-configto create a queue or exchange with the same name but different options as an existing queue or exchange would succeed, but the options were not changed for the existing queue or exchange. In this releaseqpid-configuses a new library that returns an error when an attempt is made to create an existing queue or exchange. Now an error is returned when attempting to create a queue or exchange name that already exists.
2.10. RHSA-2012:1277 – Moderate: Red Hat Enterprise MRG Messaging 2.2 security, bug fix, and enhancement update
Security Fixes
- CVE-2012-2145
- It was discovered that the Apache Qpid daemon (
qpidd) did not allow the number of connections from clients to be restricted. A malicious client could use this flaw to open an excessive amount of connections, preventing other legitimate clients from establishing a connection to qpidd.Note
To address CVE-2012-2145, newqpiddconfiguration options were introduced:max-negotiate-timedefines the time during which initial protocol negotiation must succeed,connection-limit-per-userandconnection-limit-per-ipcan be used to limit the number of connections per user and client host IP. Refer to the qpidd manual page for additional details. - CVE-2012-3467
- It was discovered that
qpidddid not require authentication for "catch-up" shadow connections created when a new broker joins a cluster. A malicious client could use this flaw to bypass client authentication.
Bug Fixes
- BZ#841196
- Previously, when the
setuid()system function was called after a connection object had been created, and theselect()function of thepython-qpidmessaging client was interrupted at the same time, an unhandled exception was raised. Consequently,python-qpidterminated unexpectedly. With this update,python-qpidtraps the exception and continues its operation without disruption in the described scenario, thus fixing this bug. - BZ#693444
- Previously, the Python and C++ versions of the messaging API were behaving inconsistently with respect to the default reliability of receivers. This inconsistency led to confusion for users involved with both APIs as well as making documentation more complicated. With this update, the Python client has been modified to have the same default reliability for topics, that is for cases where a receiver is established from an exchange. Now, when establishing a receiver from an exchange (topic pattern), the link will be unreliable by default, which is the consistent behavior. If required, the reliability option can be changed within the link options of the address.
- BZ#817283
- Prior to this update, when the
transportconnection option was used to enable SSL, despite of what the option indicated, SSL was in fact not enabled. This update ensures thattransportis properly evaluated and SSL can be enabled using this option as expected. - BZ#834608
- Previously, connections made from an existing broker to provide an update to a newly joining broker were not being authenticated. Consequently, brokers were unable to join a cluster using certain authentication types, posing a possible security issue. This bug has been fixed and all connections made to a broker are now fully authenticated and secured.
- BZ#729311
- Clustered brokers use a special array (also called local map) to store pointers to connections. If the broker had the authentication feature turned on and a connection failed to authenticate, the pointer created for it in the local map was never deleted. Consequently, these stale pointers could clog the connection counter up to its maximum, at which point the broker started to refuse new connections, whether properly authenticated or not. This situation posed a risk for a denial of service attack. With this update, pointers to connections that failed to authenticate are properly delted from the local map, thus preventing this bug.
- BZ#801465
- When both elder brokers of two federated clusters failed, the brokers that remained in the cluster and attempted recovery were not aware that the peer elder broker had failed. Consequently, the federated link could not be re-established as the recovering brokers attempted to reconnect to their peer cluster's elder broker, which no longer existed. With this update, when any broker of a federated cluster fails, the current set of active brokers in that cluster is communicated to all federation peers. Now, the federated link is properly re-established to an available broker in the peer cluster when a failover occurs.
- BZ#808090
- When a broker in a cluster created the
x-qpid.cluster-updateexchange to receive updates from an elder broker, the exchange continued to exist after the update was complete but no reporting tool listed it as active on the broker. Consequently, it was possible to bind to this stale exchange but sending a message through it could not reach the queue. With this update, these exchanges are deleted after an update operation is finished and no hidden exchanges now remain on cluster brokers in the described scenario. - BZ#809357
- Previously, code in the
setTcpNoDelay()socket function specific to the Windows operating system tried to set a property before the connection was open. Consequently, an exception was raised and the qpid-perftest.exe and qpid-latency-test.exe applications terminated. With this update, the request for a property is delayed until the connection socket is open, thus fixing this bug and conforming the Posix implementation. - BZ#840982
- When using the
EXTERNAL,CRAM_MD5, orDIGEST-MD5authentication mechanism, connections on cluster shadowed connections were created by theanonymoususer but were later deleted by the user name negotiated by the authentication mechanism. Consequently, per-user connection counts were reported incorrectly; counts were added to one user and subtracted from another user for the same connection. With this update, after the user name has been negotiated, the connection count is always applied to that user rather than theanonymoususer and connection counts and quotas now work as expected. - BZ#689408
- Previously, during an update, ACL rules were preventing access to the
cluster-updateexchange.Consequently, new brokers could not join the cluster. With this update, catch-up connections, which are authenticated ascluster-userare allowed to update state and brokers can now join the cluster as expected. - BZ#826989
- When two clusters are connected by a replication queue, brokers in both of these clusters are killed, and replacement brokers are added, one of the clusters gets out of sync over time. This occurred approximately once in 20 attempts when a process killed a broker and a new broker was added. This bug happened because the
delivery countqueue state was not properly replicated from a senior broker to the new one. Consequently, after a period of time, the new broker encountered an error not encountered by the senior broker and the new broker terminated. With this update, thedelivery countqueue state is properly replicated in the described scenario, thus greatly reducing the risk of two brokers getting out of sync. - BZ#844618
- When a new broker joined a cluster, the cluster failed to enable the broker's capability to generate queue events upon receiving messages to its queues. Consequently, the broker failed to replicate incoming messages while its senior broker replicated them properly. As a result, the two brokers got out of sync and the new broker eventually terminated. This update ensures that generating queue events is enabled in a new broker during the
CATCHUPphase of joining the cluster and the desynchronization bug no longer occurs in the described scenario. - BZ#811481
- When one or more brokers in the source cluster of a federation fail, the destination broker recovers by failing-over to an alternate broker in the source cluster. However, when the original brokers recovered, the destination broker was not aware that the brokers had returned. Consequently, when another broker in the source cluster failed, the destination broker could not re-establish the federation link. With this update, federated destination brokers monitor the source cluster to keep track of when failed source brokers recover and re-join the source cluster. Now, the destination broker can correctly recover the federation link to a recovered source broker on failover.
- BZ#750775
- When a session exception was received, the connection was marked as closed, but the underlying TCP or AMQP connection was not in fact closed. Consequently, connection leaks kept occurring until the broker was out of file handles it could use. With this update, the session is closed instead of the connection. The rest of the session and the connection are still usable and the error is still reported by the Connection Listener but session errors no longer cause connection leaks.
- BZ#781560
- Pending read requests were monitored for completion even though a forced close was in progress. Consequently, on the Windows operating system, closing an AMQP connection over a broken TCP link, such as after a heartbeat timeout, could make the Qpid client to become unresponsive. With this update, pending read requests are cancelled on close in the Windows Asynch I/O system and the client properly starts a reconnect sequence in the described scenario.
- BZ#707682
- Due to a bug in the Python management libraries, the
qpid-toolutility was unable to address the ACL object in the broker. With this update, thereloadACLFile()function is called byqpid-toolto load ACL policy changes into a running broker and the ACL object can now be accessed as expected. - BZ#841488
- Previously, CLI utilities, such as
qpid-stat, were unable to display multi-byte characters in strings. Consequently, attempts to invoke a CLI utility on a broker, which contained queues or exchanges with multi-byte characters in their names, failed. In such a scenario, an exception was raised and the display of the list of queues or exchanges was stopped. This update fixes the display library used by the CLI utilities to correctly handle the multi-byte characters andqpid-statand its peer utilities now properly display strings that contain multi-byte characters.
Enhancements
- BZ#831365
- Heartbeats are used to quickly detect loss of connectivity between a client and a broker. In broker-to-broker links (federation), there was no option to use heartbeats. Consequently, loss of connectivity in federation links could go undetected for a very long time, resulting in messages not being delivered. This update adds a new
--link-heartbeat-intervalcommand-line option, which allow heartbeats to be used at a configurable interval (in seconds) on federation links. Loss of connectivity is now quickly detected in the described scenario. - BZ#609685
- This update introduces the new
--max-negotiate-timeqpidd broker option. This option prevents a possible denial of service security concern. It specifies the maximum amount of time a new connection to the broker has in order to do protocol negotiation and authenticate itself. If this procedure does not complete in the specified time, the connection is aborted by the broker. - BZ#683711
- Previously, it was not possible to limit the number of simultaneous connections a user could make to a broker. This update introduces new command-line parameters to specify connection limits. A new code to monitor connections enforces the limits. Now, individual users cannot consume all the broker's connection resources and deny service to other users.
- BZ#831363
- The Qpid Java client has been upgraded to upstream version 0.18, which provides a number of performance enhancements over the previous version.
- BZ#819590
- This update enhances the
XAResourceImplclass maintaining a list of sibling resources for theXAResourceinterface in order to satisfy theDtxStart/DtxEndpairings for the AMQP 0.10 protocol. Now, multiple XAResources are supported. - BZ#834416
- This update adds support for failover, XA/HA data sources, and the XARecovery interface to the Qpid JCA adapter.
service qpidd stop on all nodes, or run qpid-cluster --all-stop on any one of the cluster nodes. Once stopped, restart the cluster with service qpidd start on all nodes for the update to take effect.
2.11. RHSA-2012:0529 – Moderate: Red Hat Enterprise MRG Messaging 2.1 security, bug fix and enhancement update
2.12. RHBA-2011:1393 – Moderate: Red Hat Enterprise MRG Messaging 2.0 bug fix update
2.13. RHBA-2011-1339 – Red Hat Enterprise MRG Messaging 2.0 bug fix update
Bug Fixes
- BZ#704547
- According to the JMS specification, concurrently executing clients cannot use the same client ID. Previously, it was possible to allow multiple connections to use the same client ID. With this update, it is no longer possible to define such clients. Note that you need to enable the verify client ID feature to prevent the clients from having identical IDs (-Dqpid.verify_client_id=true).
- BZ#704566
- The
getJMSXPropertyNames()method returned a usable enumeration only when called for the first time. If the user called thegetJMSXPropertyNames()method multiple times, the method returned empty enumerations. This happened because the system overwrote the first created enumeration with the subsequent call ofgetJMSXProperyNames(). With this update, the underlying code has been changed andgetJMSXPropertyNames()returns the enumeration with the property names as expected. - BZ#712011
- The system threw a SessionException if a session was committed after a transaction had exceeded the queue limit (max-queue-size). This caused the client to fail to handle the error. The system now throws a JMSException under these circumstances and client can recover with the JMSException as expected.
- BZ#723750
- In a clustered environment, the system threw a SessionException when a session was committed after a failover using the
session.commit()method. Because the client application code did not receive a standard JMSException, it could fail to recover. The system now throws a JMSException under these circumstances and the client recovers as expected. - BZ#726050
- When a connection was created with newly specified credentials, the Java client cached the new credentials and overwrote the default credentials provided by the connection URL. As a result, when a new connection was created without credentials, the connection used the cached credentials and the authentication could fail. With this update, the credentials specified at a connection creation are not cached and the default credentials for connections without credentials are used as expected.
- BZ#726712
- A queue bound to the default exchange dropped messages with a subject. This happened because the routing key was wrongly using the message subject as the queue name. The underlying code has been changed and such messages are now delivered to the queue.
- BZ#728484
- Previously, the system threw a javax.naming.NameNotFoundException if the JNDI file definition in the executed program defined a JNDI file with a syntax error. However, this is not appropriate as the program cannot access the JNDI file and the execution was therefore interrupted. Such internal exceptions are now re-thrown as ConfigurationExceptions with the message "Failed to parse JNDI properties file" and the program is interrupted.
- BZ#732534
- The broker ignored QMFv2 requests sent from a Java client because the requests did not have the
app-idparameter set. The requests are now sent withapp-idand the messages are processed as expected.
2.14. RHEA-2011:0890 – Red Hat Enterprise MRG – Messaging 2.0 Release
Bug Fixes
- BZ#453538
- Prior to this update, the MRG Messaging Broker did not consider the priority of signaled messages when delivering messages. Due to this behavior, applications could not rely on the broker to deliver higher priority messages before lower priority messages. With this update, the MRG Messaging Broker can be configured to recognize higher priority messages and adjust delivery accordingly.
- BZ#484218
- Prior to this update, the Last Value Queue (LVQ) used a specific message header by default as the key determining equivalence (
qpid.LVQ_key). Due to this behavior, producers had to be explicitly coded to set that header. This update adds a new configuration optionqpid.last_value_queue_keythat allows the choice of LVQ key to be set per-queue. Now, users can choose an application defined header. - BZ#484691
- Prior to this update, the remote direct memory access (RDMA) protocol transport for Apache Qpid only supported InfiniBand network interfaces. Due to a problem affecting message flow control when using iWarp (Internet Wide Area RDMA Protocol) interfaces, the client process was unable to transmit more than 30–40 messages on a single connection. Now, Qpid's use of RDMA supports iWarp network interfaces. Current users of RDMA must upgrade any brokers before upgrading their clients if the upgrade is staged. This upgrade order is necessary as the new brokers can detect both the old and new protocols and switch automatically, but the new clients will only use the new protocol.
- BZ#500430
- Prior to this update, the MRG Messaging Broker could not use SASL (Simple Authentication and Security Layer) and/or SSL (Secure Sockets Layer) with SASL to provide authentication and security for inter-broker federation links as it could with links between clients and the broker. This update changed the qpid-route tool and the broker functionality to allow one federation broker to act as an SASL server while the other acts as an SASL client. Now, federated links can be connected with SASL, with the external mechanism of SSL.
- BZ#581560
- Prior to this update, when using dynamic routes in inter-broker federation, spurious bindings could be left in place when a user binding was deleted. The result of this was that messages might be unnecessarily forwarded from one broker to another. With this update, the spurious bindings are properly removed when a client process removes its bindings.
- BZ#611820
- Prior to this update, the counters on the broker were not updated for outgoing frames and messages. Due to this behavior, statistics were not accurately reported for outgoing frames and messages, always displaying zero. With this update, the statistics are modified for outgoing frames and messages. Now, reported statistics are accurate.
- BZ#614944
- Under certain circumstances, the
qpiddservice terminated unexpectedly on startup. With this update, the broker is modified so that no more failures occur. - BZ#615300
- Prior to this update, the failover exchange was not listed by management tools. This update fixes this problem and lists the failover exchange by management tools.
- BZ#629926
- Previous versions of the MRG Messaging Windows software development kit (WinSDK) did not include an example on how to authenticate with a remote broker. An example has now been added.
- BZ#632348
- Qpid's variant types are internally represented as integer values. Previously, when a user wanted a client application to print messages about variant types in a readable form, they had to write a custom function to convert an integer value to an appropriate string. To address this issue, this update exposes the
qpid::types::getTypeName()function in the public API, and users are no longer required to write custom conversion functions. - BZ#632625
- When a user created a queue with a policy set to
ring, previous versions of the C++ broker failed to enforce the byte size limit. This update applies an upstream patch that resolves this issue, and the byte size limit for ring queues is now implemented as expected. - BZ#641822
- On Windows machines, the qpid-perftest utility sometimes measures test intervals as zero seconds. Previously, this could cause the utility to fail to generate statistics instead giving the error message:
Controller exception: Bad report: 1.#INF
To avoid possible confusion, this update adapts qpid-perftest to detect zero second test intervals, and users are now presented with a comprehensive error message in these cases. - BZ#646913
- When the hostname in a broker URL contained an underscore (
_), the previous version of the Java Message Service (JMS) client incorrectly evaluated it aslocalhost, and failed to establish a connection, giving the following error message:java.net.ConnectException: Connection refused
With this update, an upstream patch has been applied to ensure the JMS client evaluates hostnames correctly, and the presence of an underscore in such a hostname no longer prevents it from connecting to a host. - BZ#647858
- Previously, the .NET Binding for the C++ Qpid Messaging Client did not provide a function to acknowledge individual messages. This error no longer occurs, and the
Session::Acknowledge()function is now included in the binding layer as expected. - BZ#649003
- The
getPropertyNames()method from thejavax.jms.Messageinterface returns an enumeration of property names in a message. Previously, this included properties with values that are not of a supported type as defined in the JMS specification. Consequently, an attempt to access such a property caused theMessageFormatExceptionto be raised. With this update, thegetPropertyNames()method has been adapted to return only the names of properties with values of a supported type. As a result, accessing properties returned by this method no longer leads to the aforementioned exception. - BZ#652233
- By default, Qpid utilities do not authenticate to a broker when the
PLAINauthentication method is used. Previously, when theqpiddservice was configured to enforce this authentication method, an attempt to run theqpid-stat -ccommand failed with a traceback, and the qpid-tool utility incorrectly produced empty results. With this update, these utilities now display a proper error message when they fail to establish connection with a broker. - BZ#653167
- Prior to this update, the qpid-config utility sometimes failed to handle and display certain queues. This happened if the queue attributes, which require integer values (such as
qpid.max_size, orqpid.max_count), contained string values. With this update, qpid-config now displays all queues with their attributes even when the attributes contain invalid values. - BZ#653923
- When a connection is established with the
reconnectparameter set totrue, it attempts to retransmit unacknowledged messages whenever it reconnects. Prior to this update, messages transmitted after such reconnection from a C++ client were not marked asredelivered. With this update, the underlying source code has been changed, and such messages are now marked asredeliveredas expected. - BZ#654020
- Previous versions of the MRG Messaging Broker allowed users to declare queues with invalid argument values (for example, negative integers or non-integer values). Such values were ignored by a broker, but querying the queue would still return these invalid values. This update adapts the underlying source code to reject incorrect declarations. As a result, an attempt to declare a queue with invalid argument values now fails with an error.
- BZ#654461
- Prior to this update, if a queue (or any other broker object) was created and a management operation (such as
purge) was invoked on the new object shortly after its creation (that is, 10 seconds or less), the operation might fail with unknown object . This update corrects this problem, and allows operations to be reliably invoked on new objects. - BZ#657523
- A Spout program (for example,
csharp.example.spout) sent messages but did not wait for the sending operation to complete before closing the session. As a result, not all of the messages processed by Spout'ssender.Send()function were received by the session receiver. With this update, thesession.Sync()function is called before closing the session to assure all messages are received by the session receiver. - BZ#658448
- Due to a source file name (
csharp.map.receiver.cs) being misspelled, it was difficult to find that file using normal search methods. This update fixes the spelling error, and the aforementioned file is shown in directory searches as expected. - BZ#659071
- As the
qpid::messaging::Connection:isOpen()method was previously not marked as const, it could not be invoked on objects referenced through a variable of type const. This update adds a const equivalent to theqpid::messaging::Connection:isOpen()function, fixing this issue. - BZ#663013
- Prior to this update, the default address created by the Python client was incorrectly marked as durable. With this update, the underlying source code has been modified to address this issue, and the Python client no longer marks the default address as durable.
- BZ#663022
- Using the
assertparameter in the address string for object creation (for example: a queue, an exchange) triggers a verification process before messages are sent to an end object, if the created object is of a specific type. Prior to this update, the Python client sent messages to the end objects without checking the object's type. Thus, the Python client was, for example, able to send messages to an exchange even if theassertparameter was used and set for a different type. With this update, the underlying source code has been modified to address this issue, andasserttypes are properly checked. - BZ#664486
- The
reroutemanagement method allows messages to be rerouted to a specific exchange. However, if there were no matching bindings, messages could have been dropped even if that exchange had thealternate-exchangeoption defined. This update modifies the broker to properly reroute messages through the alternate exchange, if defined. - BZ#666931
- Due to the absence of Boolean variations of the .NET Binding Receiver functions
Get()andFetch(), access to native functionality of the messaging client (for example, compiling) was not possible. This update adds the missingGet()andFetch()functions to the .NET Binding layer. - BZ#667172
- When a connection is established with the
reconnectparameter set totrue, it attempts to retransmit unacknowledged messages whenever it is reconnected. Prior to this update, messages transmitted after such reconnection from a JMS (Java Message Service) client were not marked asredelivered. With this update, the underlying source code has been changed, and such messages are now marked asredeliveredas expected. - BZ#667428
- Prior to this update, messages could have been enqueued outside of transaction boundaries for transactions that were in progress during a transparent failover. With this update, in case a transparent failover occurs, all transaction sessions are closed and an exception is thrown, and transactions in progress are rolled back.
- BZ#667771
- Prior to this update, when setting up a durable subscriber with a destination created using an addressing string, a
javax.jms.InvalidDestinationExceptionexception occurred. The problem has been fixed in this update so thecreateDurableSubscription()method now modifies its destination validation to check if the destination, which was passed in, is an instance ofjavax.jms.Topicandorg.apache.qpid.client.AMQDestinationinstead of just checking if the destination is an instance oforg.apache.qpid.client.AMQTopic; thus thecreateDurableSubscriptionmethod is now able to set up a durable subscriber from a destination created using an address string. - BZ#674005
- Prior to this update, when setting up a
QueueReceiverinterface with a destination created using an addressing string, ajava.lang.ClassCastExceptionexception occurred. The problem has been fixed with this update by modifying thecreateReceiver()method to check if the destination, which was passed in, is an instance ofjavax.jms.Queueandorg.apache.qpid.client.AMQDestinationinstead of just casting the destination intoorg.apache.qpid.client.AMQTopic; thus thecreateReceiver()method is now able to set up aQueueReceiverinterface from a destination created using an address string. - BZ#668580
- Prior to this update, the messaging binding object access was not properly interlocked. As a consequence, object deletions could have failed with an access violation under certain circumstances. This problem has been fixed by adding interlocks to serialize references to objects being deleted so that object deletions no longer fail due to access violations.
- BZ#671369
- Prior to this update, the remote direct memory access (RDMA) client terminated unexpectedly with a segmentation fault under certain circumstances if there was no Simple Authentication and Security Layer (SASL) mechanism specified either on a client or server side. The problem has been fixed in this update so that the client now runs as expected even if no SASL mechanism is specified.
- BZ#674390
- Prior to this update, the encoding of string data into a message buffer did not check if there was enough available space in the buffer for the string. As a consequence, encoding string data that was too large for a given buffer corrupted the buffer memory, and could terminate the broker unexpectedly under certain circumstances. The fix has been provided in this update so that string encoding functionality now verifies the message buffer is large enough to hold the encoded string. If there is not enough room in the buffer to hold the encoded string, an exception is returned to the caller and the buffer is not modified; thus buffer memory corruption no longer occurs and the broker does not terminate unexpectedly.
- BZ#674631
- In subscriptions using a broker managed credit window, this window was sometimes not moved forward correctly in the presence of rejected messages. As a consequence, sessions that were rejecting messages could have stalled. Also, the sessions did not receive any further messages even if messages were available on the queue they were subscribed to. The problem has been fixed by ensuring that the credit window always moves correctly for rejected messages so that sessions that are rejecting messages now continue to receive messages as expected.
- BZ#674678
- Prior to this update, closed receivers with an address defined with
qpid::messaging::Addressand with thedelete:alwaysoption set did not delete queues and did not reroute acquired messages to an alternate exchange. The problem has been fixed in this update so that the closed receivers now properly delete queues and reroute acquired messages as expected. - BZ#675921
- Prior to this update, the internal test that showed management objects for inter-broker bridges could have been created inconsistently in a cluster. As a consequence, it was possible for brokers to shut down with an invalid argument error under certain circumstances. The fix for this bug has been provided in this update, correcting the inconsistency so that the bridge management objects are created consistently in a cluster.
- BZ#676345
- Prior to this update, running the
qpid-config -a guest/guest@hostcommand against a remote broker resulted in an exception similar to the following under certain circumstances if theANONYMOUSmethod was used for authentication on the remote server:SessionException: ExecutionException(error_code=403, command_id=serial(29), class_code=0, command_code=0, field_index=0, description=u'unauthorized-access: authorised user id : anonymous@QPID but user id in message declared as guest (qpid/broker/SemanticState.cpp:473)', error_info={}, channel=1, id=serial(2))The problem has been resolved in this update so that running the command no longer results in an exception. - BZ#679212
- When rerouting orphaned or rejected messages to a queue's alternate exchange, if there were no matching bindings the broker did not follow the exchange's
alternate-exchangeoption. As a result, orphaned or rejected messages were dropped instead of following a legitimately defined path. The problem has been fixed in this update by modifying the broker to test whether the rerouting through queue's alternate exchange was successful. If not, the attempt made to reroute to that exchange's alternate exchange is defined so that orphaned or rejected messages are no longer dropped if the exchange they are routed to first has an alternate exchange. - BZ#679216
- A queue can be configured with an alternate exchange such that messages routed via the alternate might find their way back to the same queue. Prior to this update, when such a queue was deleted, the messages in the queue might circulate indefinitely within the broker. This update corrects this behavior.
- BZ#679723
- Prior to this update, the server example C++ application (the
server.cppfile) required a second input argument, that was not used, before entering the third input argument. Only the first and the third input arguments were used by the application. The problem has been fixed in this update so that the application now uses the first and second input arguments as expected. - BZ#680228
- Prior to this update, running the topic scalability test caused the broker to become unresponsive when more than 100 subscribers were present in durable mode. This was due to a deadlock in Berkeley DB (BDB) database's
del()function. The problem has been fixed in this update so that the broker does not hang anymore with a large number of subscribers. - BZ#680477
- When an exchange was declared with a type unknown to the C++ Broker, the incorrect error code 530 — not-allowed was returned. With this update, error handling for the described scenario has been fixed and now the correct error code 404 — not-found is returned.
- BZ#680479
- When the
message_cancel()call was issued for an unknown subscription destination, the erroneous cancel was ignored and nothing happened. With this update, the broker code was modified to raise an appropriate error as required by the specification, and the 404 — not-found error code is returned in the described scenario. - BZ#681026
- When a cluster was run with persistent message store, durable messages and a management agent such as
qpid-toolorcumin, a broker sometimes terminated with the invalid argument error message. With this update, issues in the store that caused inconsistencies in a cluster have been fixed and brokers no longer terminate in the described scenario. - BZ#681313
- The
rdmadriver adds a small trailer to outbound buffers. The size of this header was not accounted for when the buffer's size was passed to the codec. If the codec filled all available buffer space, therdmadriver overwrote the end of the buffer when adding the trailer. As a consequence, theqpiddserver terminated unexpectedly when implementing RDMA (Remote Direct Memory Access). With this update, a patch has been applied that allows theRdmaIOlayer to reserve header space in sent buffers, fixing this bug. - BZ#681331
- When an exchange was declared with the
alternate-exchangeconfiguration option on one node of a cluster, other nodes in the cluster were not made aware of this configuration, preventing communication with new cluster members. With this update, the code for the cluster update process has been fixed and alternate exchanges now work as expected in a cluster. - BZ#682218
- Previously, the broker maintained two lists of exclusive queues, one for a session and one for the connection. The latter list was incorrect. Deletion of queues prior to the session closing did not result in removal of all references to the queue, thus preventing the queue object from being freed. Consequently, exclusive queues that were deleted too soon would not actually free up and would still be reported by management tools. With this update, the broker will only maintain one list of exclusive queues at session scope, and delete any reference in that list when a queue is deleted too soon. Deletion now frees the queue up and the management reports are accurate.
- BZ#684182
- Due to package reorganization, a direct upgrade of the debuginfo package from version 1.3.2 to a later one fails due to file conflicts. As a consequence, installation or upgrade of the messaging and qmf packages failed with file conflicts if debug symbols were installed and requested to automatically update. With this update, the following workaround is provided: the previous version of the debuginfo package needs to be manually uninstalled before attempting to install or upgrade to newer messaging and qmf packages, with the command below:
$ rpm -ev qpid-cpp-mrg-debuginfo
This workaround does not introduce any limitations, and is straightforward for users of debuginfo packages to execute. - BZ#690261
- Previously, when the qpidd server was restarted, if the static keys for brokers were the same, the necessary bindings failed to be created during the broker federation if a route with the same name existed. Consequently, some links were lost after the restart. This bug has been fixed and the interface now requires the remote origin when deleting a route.
- BZ#692132
- When a broker session was enqueuing a message at the exact moment a receiver was dequeuing it, the broker sometimes terminated unexpectedly. With this update, the
enqueueComplete()call, responsible for re-queuing messages into a session, has been removed and this bug no longer occurs. - BZ#692134
- Previously, in the
qpid-configutility command line help, descriptions for--max-queue-sizeand--max-queue-countcommand options were incorrect. With this update, the descriptions have been amended. - BZ#693407
- Previously, it was possible to define a session name too long for the object defined in the management schema to handle. Due to a missing check in the code, this caused the broker to terminate unexpectedly. With this update, additional exception handling has been added to the
Timer.cppfile and the broker no longer crashes. - BZ#694617
- When using the JMS client, if a queue or an exchange object was created with the
alternate-exchangeoption defined, Qpid ignored the alternate exchange setting. This update adds an upstream patch that adapts the underlying code, and such objects are created with the definedalternate-exchangeoption as expected. - BZ#695263
- Prior to this update, the Qpid broker could terminate unexpectedly with a segmentation fault under heavy load. This happened due to a race between the
SessionStatethread and the journaling thread. This update adds a lock to theSessionStatethread and the deadlock no longer occurs. - BZ#695353
- Prior to this update, the address parser of the Java client failed to parse the destination address if the node durability was defined with string values (
TrueorFalse) and terminated withjava.lang.ClassCastException. With this update, the parser correctly handles both the Boolean and the string versions of the parameters. - BZ#695716
- Prior to this update, the C++ broker could terminate unexpectedly under heavy load. This happened due to a deadlock, which occurred when the broker without a store handled concurrent DTX operations on durable queues and durable messages. This update applied an upstream patch, which adds locks to the processes and prevents the deadlock.
- BZ#696373
- The JMS client did not contain information on prerequisites and basic run instructions for the delivered examples. This update adds a README file with the information.
- BZ#696637
- Previously, the message queue was inconsistent within nodes of a cluster if the message exchange used the
iveoption. This occurred because the message exchange holds the last message and the message cannot be transferred to new members. With this update, theiveoption has been disabled for clusters, thus fixing this bug. - BZ#696655
- Prior to this update, a cluster node shut down with an
inconsistenterror after an exchange was deleted. This happened because the messages from the deleted exchange were still stored in the exchange cache and the broker attempted to deliver them. With this update, such messages are marked as invalid and are ignored by the broker. - BZ#696698
- Prior to this update, if multiple queues were bound to the same routing key and the exchange failed to deliver messages to one of the several defined queues, the delivery to the other queues failed as well. With this update, the exchange delivers messages to every queue, which has not recorded an error since the delivery failure.
- BZ#696974
- Prior to this update, the qpid-config help contained an incorrect description of the
-a(or--broker-addr) command line option and incorrect name of the--force-if-usedoption. With this update, the mistakes have been fixed. - BZ#697913
- Prior to this update, the application occasionally became unresponsive when using a synchronous operation. This happened due to a deadlock of two reporting threads during exception reporting. With this update, the underlying code has been modified and the deadlock no longer occurs.
- BZ#698721
- When sending durable messages, the python spout example sometimes experienced performance issues. This happened because the spout enforced synchronous transmission. However, this solution is not appropriate. With this update, the send options have been changed to use asynchronous and durable transmission.
- BZ#701777
- If a broker withholds the acknowledgments of a QMF (Qpid Management Framework) agent, the agent gets into the flow-stopped mode. Prior to this update, the agent became locked in the mode if it had received a method-call message while in the mode. With this update, the agent's connection is shut down and re-established in such circumstances and the agent continues to work as expected.
Enhancements
- BZ#547743
- Prior to this update, the Messaging API was unable to dynamically create and delete broker entities such as queues, exchanges and their bindings. Due to this, the application had to use older APIs and workarounds to process these entities. With this update, management methods are exposed to create and delete these entities on the broker via QMF. With QMFv2 this can be achieved by sending a message of a defined format to a specified address.
- BZ#569918
- With this update, the SSL client options are supported for Qpid messaging.
- BZ#585844
- Prior to this update, queues marked for automatic deletion were deleted immediately after being released from a session. If a failover occurred the queues were permanently lost. This update introduces a delay between the time a queue is eligible for automatic deletion and the time it is actually deleted. Additionally, if this delay period is longer than the failover time, then the queue survives the failover and if it is not required it is automatically deleted.
- BZ#606357
- Prior to this update, the oldest messages were removed first when the queue removed messages to remain within the predefined size. Due to this behavior, in certain circumstances messages that were more valuable would be removed while less valuable messages remained. This update makes such queues configurable to recognize message priority. Now, messages can be removed according to their priority.
- BZ#621467
- This update adds the option to acknowledge only specific messages. Now, applications can tie acknowledgments to asynchronous confirmations they receive on successful processing of the messages.
- BZ#624793
- With this update, the Advanced Message Queuing Protocol (AMQP) allows the user to inspect exclusive queues that previously could not be browsed.
- BZ#632000
- When a user subscribed to an LVQ queue, already browsed messages were not replaced, and each subsequent update caused the queue to grow. This prevented subscribers from using the LVQ queues as a read-only cache of the latest values. This update introduces the
qpid.last_value_queue_keyconfiguration options, allowing a user to specify the key used by the application to determine the message equivalence. Now, subscribers always receive latest updates, and updated messages replace older messages with a matching key. - BZ#632396
- Previously, when a user attempted to use a
java.util.UUIDobject as a value of a Map message or in a List that itself is used within a Map message, the Java Message Service (JMS) client failed with an exception. With this update, the JMS client has been adapted to allow the use of thejava.util.UUIDobjects in Map messages. Also, the JMS client is now able to decodejava.util.UUIDfrom an incoming Map message. - BZ#667463
- With this update, MRG now includes a
tcp_nodelayoption for the Python API. - BZ#660291
- This update adds a flow control mechanism, allowing the broker to measure the current level of data in each queue via the
high_watermarkandlow_watermarkflags. This flow control mechanism allows the use of credit to prevent a queue overflow event and to provide information to a client about the data levels in a queue. - BZ#660289
- Prior to this update, messaging was unable to monitor the growing queue depth aside from polling constantly or waiting until the maximum level was reached to issue a warning. The broker now allows QMF events to be generated when the queue depth reaches a previously configured threshold, providing an early warning for elongated messaging queues.
- BZ#657398
- Previously, logging level at runtime could not be altered without restarting the broker. A management method now used allows the user to change the level of logging while the program is running, without having to restart the broker. This allows users to get detailed logs during troubleshooting then return to normal logging settings to prevent excessive logs.
- BZ#667970
- A new feature is added to track statistics about the number of messages transferred over the connection instead of tracking only the numbers of frames and bytes transferred across the connection.
- BZ#681166
- With this update, the implementation of map encoding into AMQP in Python has been improved; the
forloop to traverse the dictionary was replaced with themap()function. The map encoding is significantly faster. - BZ#681279
- With this update, the number of buffers allocated on a client by the
TCPConnectorinterface was reduced from 32 to 4, saving considerable amount of memory allocated per connection. - BZ#693895
- With this update, the logic that stores callbacks in the
CyrusSaslconstructor has been enhanced to make C++ clients behave similarly to other language clients as follows: if there is no username, then do nothing with eitherNAMEorPASSWDcallbacks; else, if there is a username but no password, then explicitly store an emptyPASSWDcallback. - BZ#700822
- Producer flow control for clusters is now supported.
Chapter 3. MRG Realtime on Red Hat Enterprise Linux 6
- 3.1. RHSA-2014:19491 – Important: Red Hat Enterprise MRG Realtime 2.5 kernel update
- 3.2. RHEA-2014:19227 – Red Hat Enterprise MRG – Realtime 2.5 Enhancement
- 3.3. RHSA-2014:16964 – Important: kernel-rt security, bug fix, and enhancement update
- 3.4. RHSA-2014:0100 – Important: kernel-rt security and bug fix update
- 3.5. RHBA-2013:1293 – Red Hat Enterprise MRG Realtime 2.4 bug fix update
- 3.6. RHSA-2013:0829 – Important: kernel-rt security and bug fix update
- 3.7. RHBA-2013:0828 – Red Hat Enterprise MRG Realtime 2.3 bug fix update
- 3.8. RHSA-2013:0622 – Important: kernel-rt security and bug fix update
- 3.9. RHSA-2013:0566 – Important: kernel-rt security and bug fix update
- 3.10. RHBA-2013:0563 – Red Hat Enterprise MRG Realtime 2.3 bug fix update
- 3.11. RHBA-2012:1492 – Red Hat Enterprise MRG Realtime 2.2 bug fix update
- 3.12. RHSA-2012:1491 – Important: kernel-rt security and bug fix update
- 3.13. RHSA-2012:1282 – Moderate: kernel-rt security, bug fix, and enhancement update
- 3.14. RHEA-2012:1280 – Red Hat Enterprise MRG Realtime 2.2 enhancement update
- 3.15. RHSA-2012:0670 – Important: kernel-rt security and bug fix update
- 3.16. RHBA-2012:0496 – Red Hat Enterprise MRG Realtime 2.1 kernel bug fix update
- 3.17. RHBA-2012:0495 – Red Hat Enterprise MRG Realtime 2.1 bug fix update
- 3.18. RHSA-2012:0333 – Important: kernel-rt security and bug fix update
- 3.19. RHBA-2012:0044 – kernel-rt bug fix update
- 3.20. RHSA-2011:1253 – Important: kernel-rt security and bug fix update
- 3.21. RHEA-2011:0895 – Red Hat Enterprise MRG – Realtime 2.0 Release
- 3.22. RHEA-2011:0894 – Red Hat Enterprise MRG – Realtime 2.0 Release
3.1. RHSA-2014:19491 – Important: Red Hat Enterprise MRG Realtime 2.5 kernel update
realtime-kernel
- BZ#1159803
The MRG Realtime kernel did not contain the appropriate code to resume after a device failed causing the volume status after a repair to not be properly updated. A "refresh needed" was still listed in the "lvs" output after executing the "lvchange --refresh" command. A kernel patch was added to that adds the ability to correctly restore a transiently failed device upon resume.
- BZ#1157949
The MRG kernel scheduler code was missing checks for the PREEMPT_LAZY flag allowing tasks to be preempted more times than necessary causing latency spikes on the system. Additional checks for the PREEMPT_LAZY flag were added to the check_preempt_wakeup() and check_preempt_curr() functions in the scheduler code so that preempt wakeups were reduced and these latency spikes were removed.
- BZ#1182246
With the older Mellanox Connect-IB (mlx4) driver present in the MRG Realtime kernel, a race condition could occur that would cause a loss of connection. This mlx4 driver was updated resolving the race condition and allowing proper connectivity using the mlx4 driver.
- BZ#1191767
The MRG Realtime version of the e1000e driver did not provide support for the Intel I218-LM network adaptor. The Realtime 3.10 kernel was updated to provide support for this additional hardware.
- BZ#1172844
The MRG-Realtime kernel-rt sources will be modified to take advantage of the updated 3.10 kernel sources that are available with the Red Hat Enterprise Linux 7 releases. This will help ensure our MRG-Realtime drivers stay more up to date with other Red Hat releases.
- BZ#1169900
The sosreport executable would hang reading /proc/net/rpc/use-gss-proxy because of the wait_queue logic in the proc handler that had no chance of waking up from reading the file. This wait_queue logic was removed from the proc handler allowing the reads to correctly return the current state.
- BZ#1171363
The MRG Realtime 3.10 kernel was updated to provide support for the Mellanox Connect-IB (mlx5).
- BZ#1169755
An upstream patch removing the xt_connlimit revision zero ABI was not reverted on the kernel-rt package, which caused problems because the iptables package present in Red Hat releases requires this revision. This upstream patch to remove the xt_connlimit revision 0 was reverted from the kernel-rt sources to allow the iptables command to execute correctly on Red Hat Enterprise MRG.
vulnerability
- BZ#1195248
A flaw was found in the way the Linux kernel's XFS file system handled replacing of remote attributes under certain conditions. A local user with access to XFS file system mount could potentially use this flaw to escalate their privileges on the system.
- BZ#1198457
A NULL pointer dereference flaw was found in the way the Linux kernel's madvise MADV_WILLNEED functionality handled page table locking. A local, unprivileged user could use this flaw to crash the system.
- BZ#1163792
A flaw was found in the way the Linux kernel's splice() system call validated its parameters. On certain file systems, a local, unprivileged user could use this flaw to write past the maximum file size, and thus crash the system.
- BZ#1198503
It was found that due to excessive files_lock locking, a soft lockup could be triggered in the Linux kernel when performing asynchronous I/O operations. A local, unprivileged user could use this flaw to crash the system.
- BZ#1151353
A race condition flaw was found in the Linux kernel's ext4 file system implementation that allowed a local, unprivileged user to crash the system by simultaneously writing to a file and toggling the O_DIRECT flag using fcntl(F_SETFL) on that file.
3.2. RHEA-2014:19227 – Red Hat Enterprise MRG – Realtime 2.5 Enhancement
realtime-configuration
- BZ#1155739
- MRG-Realtime stores release information in a database that is used to create the contents of the /etc/mrg-rt-release file upon boot. The rt-setup package, which manages this database, has been updated with new information about the MRG-Realtime releases.
3.3. RHSA-2014:16964 – Important: kernel-rt security, bug fix, and enhancement update
realtime-kernel
- BZ#1061893
Previously, calling
cpu_chill()from a softirq could block the timer softirq. This could result in a deadlock, ascpu_chill()callsmsleep()which requires the local timer softirq to wake it up. But if it blocked the timer softirq, it would never wake up. A change was made tocpu_chill()to callhrtimer_nanosleep()instead. Thehrtimer_nanosleep()is woken by the hardware timer interrupt. Now the softirq that calls cpu_chill() will sleep for one microsecond and be woken by the hardware interrupt, which does not get blocked by any locks held by the caller ofcpu_chill(), and removes the deadlock.- BZ#1067880
The MRG Realtime kernel was updated to match the version of hpsa to that available in Red Hat Enterprise Linux 6.
- BZ#1058848
It was discovered that a corruption of net list occured when the sit module is loaded and unloaded. This occurs while
/proc/<pid>/net/devis being read. This is because the sit modules callslist_netdevice()which adds itself to the network list that is read by that proc file. The unloading of the module does not remove itself and leaves a freed item on the list. When reading the proc file that freed item is accessed and undefined results occur.- BZ#1016735
A series of performance patches were applied to the 3.10 MRG Realtime kernel.
- BZ#1032245
MRG Realtime 2.5 has rebased the current version of the kernel to a 3.10 version.
vulnerability
- BZ#1077343
Some occurences in the netfilter tree use
skb_header_pointer()wheredhitself is a pointer that is being passed as the copy buffer. A remote attacker could use this flaw to crash the system or potentially to escalate their privileges on the system. While not affecting the versions of Linux kernel as shipped with Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6, this issue does affect the version of the kernel package as shipped with Red Hat Enterprise MRG 2.- BZ#1064253
A flaw was found in the way cifs handled iovecs with bogus pointers userland passed down via writev() during uncached writes. An unprivileged local user with access to cifs share could use this flaw to crash the system or leak kernel memory. The default cache settings for cifs mounts on Red Hat Enterprise Linux prohibit successful exploitation of this issue.
- BZ#1052914
A Linux kernel is found to be vulnerable to a NULL pointer dereference flaw caused due to inappropriate handling of Floating Point Unit (FPU) exceptions during task switch at
emmsinstruction. This only affects AMD CPU family in both i386 and AMD64 modes. A user/program could use this flaw to kill tasks at random resulting in DoS or potentially gain root privileges if allowed to map NULL(mmap_min_addr=0)page.- BZ#1074471
A Linux kernel built with the IPv6 protocol support
(CONFIG_IPv6)is vulnerable to a kernel crash caused by a flood of IPv6 router advertisement(RA) packets. It occurs while processing the IPv6 router advertisement packets. A remote attacker in the same layer 2 segment can use this flaw to crash the kernel on a target system, resulting in DoS. While not affecting the versions of Linux kernel as shipped with Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6, this issue does affect the version of the kernel package as shipped with Red Hat Enterprise MRG 2.- BZ#1062356
A Linux kernel built with the NSA SELinux Support
(CONFIG_SECURITY_SELINUX)is vulnerable to a crash caused by an empty SELinux security context value. If a file has an empty security context, listing it vials(1)could trigger this crash. Only user/processes withCAP_MAC_ADMINprivileges are allowed to set the SELinux security context of a file. A user/process withCAP_MAC_ADMINprivileges could use this flaw to crash the kernel, resulting in a DoS. While not affecting the versions of Linux kernel as shipped with Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6, this issue does affect the version of the kernel package as shipped with Red Hat Enterprise MRG 2.- BZ#1058748
A Linux kernel built with the NetFilter Connection Tracking
(NF_CONNTRACK)support for IRC protocol(NF_NAT_IRC), is vulnerable to an information leakage flaw. It could occur when communicating over direct client-to-client IRC connection via a network using NAT. A user/program could use this flaw to leak kernel memory bytes. While not affecting the versions of Linux kernel as shipped with Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6, this issue does affect the version of the kernel package as shipped with Red Hat Enterprise MRG 2.- BZ#1035875
A Linux kernel built with the networking support
(CONFIG_NET)is vulnerable to a memory leakage flaw. It occurs while doing therecvmsg(2),recvfrom(2), orrecvmmsg(2)socket calls. A user or program could use this flaw to leak kernel memory bytes.
3.4. RHSA-2014:0100 – Important: kernel-rt security and bug fix update
realtime-kernel
- BZ#1045312
The Nagle latency tuning patch missing, so the
tcp_delack_mintunable was not available. This update ports theato minanddelack mintunable to v3.8-rt, so the tunable as documented in the Red Hat Enterprise MRG Realtime Tuning Guide is available.- BZ#1020491
New hardware and invalid DMA length registers caused link stability problems with tg3 adapter, consequently tg3 continually reported a blocked status until the system locked up. The tg3 driver has been modified to handle new hardware and implement a DMA workaround. As a result, the tg3 adapter works and the system does not crash.
- BZ#1037770
Previously the realtime kernels were compiled without the
CONFIG_NETFILTER_XT_MATCH_ADDRTYPEoption. Consequently, a necessary module in order to add an iptables rule matching on an addrtype was not available. Now,CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=mhas been added to config files. Theipt_addrtypemodule is available and it is possible to create an iptables rule matching on an addrtype.- BZ#1020382
Power-on-reset could leave tg3 DMA length registers with uninitialized values, causing the link to bounce and then fail completely. This update ensures that the length registers are initialized properly, resulting in a stable link after power-on reset.
- BZ#1002918
Under heavy I/O load on an ext4 filesystem the system became increasingly unresponsive, with a growing amount of tasks in the "uninterruptible" state ("D"), until it becomes unusable. A fix to a related issue (0ef5418 jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space) was backported from upstream, and now the test kernel runs for a longer period without incidents. Note: This issue is extremely difficult to reproduce, only one customer has reported it and it takes about month for the issue to appear. The exact procedure to reproduce this issue is, as of now, unknown.- BZ#1034361
The installation process of a new MRG Realtime production kernel was always setting the production kernel as the default boot kernel. That was done intentionally by adding the
--make-defaultcommand line argument when callingnew-kernel-pkg. Attempts to set a different kernel as the default (by using the "UPDATEDEFAULT=no" parameter to the configuration file) was overriden by the--make-defaultargument in the command line. The installation process has been modified to observe the presence of "UPDATEDEFAULT=no" in the configuration files and to persist it. Otherwise, the new MRG Realtime production kernel becomes the new default kernel.- BZ#1039871
The caller of recvmsg handlers responsible for passing the correct msg->msg_namelen caused numerous uninitialized memory leaks in recvmsg handlers. Now, the caller passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. As a result, there are no more uninitialized memory leaks in recvmsg handlers and it is harder for new code to accidentally leak uninitialized memory.
- BZ#1043043
edac_mce_amdis a module on Red Hat Enterprise Linux but built-in on Red Hat Enterprise MRG. Customers were confused by a Knowledge Base article that recommends loading theedac_mce_amdmodule.CONFIG_EDAC_DECODE_MCE=mhas been added to the configuration, so the module is available for loading as instructed in the Knowledge Base article.- BZ#1039743
Use of pointer in FB tunnel after it has been removed caused panic in the module load test when unloading the
ip6_tunnelmodule. Now, a handler is added for dellink that never removes the FB tunnel. The pointer is still available since the FB tunnel is not destroyed when the module is unloaded, thus no panic occurs.- BZ#1016729
Various aspects of cache on Infiband resulted in potentially bad performance. This update introduces the following fixes: 1. Only cache valid GIDs and not empty ones to reduce search time. 2. When searching GIDs on a list of devices, first start with the device that received the request. 3. Change cma_acquire_dev function to search for cached GIDs. These fixes result in improved Infiniband performance, and remove potentially pathological (for throughput performance) code paths.
3.5. RHBA-2013:1293 – Red Hat Enterprise MRG Realtime 2.4 bug fix update
realtime-kernel
- BZ#998920
Previously, dracut was only looking at the default location for firmware files while creating the
initramfsimage, ignoring the firmware files provided byrt-firmwareand bykernel-rt-firmware. As a consequence, if the firmware for a device required to boot the system was not at the default location, the firmware would be missing in the initramfs image, causing the device driver to fail at boot time. In this release, bothrt-firmwareandrt-setupcarry minimal dracut config files to make dracut aware of the alternative places for firmware files. This allows the required drivers to boot the system successfully.
3.6. RHSA-2013:0829 – Important: kernel-rt security and bug fix update
Bug Fixes
- BZ#918098
Portions of buildID were missing from vmlinux and vDSO images, consequently it was not possible to use systemtap or make annotations in perf. This update sets the
AFTER_LINKvariable which gets the buildID components into the final emdedded images. Systemtap and perf now work as expected.- BZ#923222
A race condition in the bonding driver caused the system to be unresponsive. This update adds a
synchronize_net()innetdev_rx_handler_unregister(), ensuring that a rcuprotected reader is guaranteed to see a non NULLrx_handler_data, thus preventing the race condition in the bonding driver and restoring system responsiveness.- BZ#921554
Cgroup configurations have been enabled in the Realtime kernel version 3.6, so its services and capabilities match Red Hat Enterprise Linux more closely.
- BZ#914765
The
DEFAULTKERNELparameter was not modified in/etc/sysconfig/kernelfor the kernel-rt package. Consequently, upgrading any non-Realtime kernels reset the default kernel to a non-Realtime kernel. All MRG Realtime kernel variants are now classified as 'kernel-rt' when installing or upgrading, so upgrades of non-Realtime kernels do not reset the default kernel, and the Realtime kernel remains the default as expected.- BZ#927026
Previously the NO_HZ option was enabled by default, which could have adverse effects on system performance. The NO_HZ option is now disabled by default to improve performance, but users can still enable it if necessary using the
CONFIG_NO_HZboot option.
3.7. RHBA-2013:0828 – Red Hat Enterprise MRG Realtime 2.3 bug fix update
This update provides bug fixes for the following packages:
- rt-setup
- The rt-setup package configures settings required by the Red Hat Enterprise Linux real-time environment, such as creating the realtime group, adding realtime user privileges to PAM (Pluggable Authentication Models), enabling the configuration of kdump in real time, and disabling irqbalance by default.
- rtctl
- The rtctl package contains a set of scripts, which are used to manipulate the scheduling priorities of groups of system threads.
Bug Fixes
- BZ#964190
- The latest MRG 2.3 kernels have a different version name that contains a stable sub-release number. This versioning prevented rtctl from matching the correct configuration file, so rtctl resorted to using the default configuration file, which could cause some threads to have unusual priorities. The routine used to match configuration file names and kernel versions was changed to address the special case present on MRG 2.3. The thread priorities are now modified according to the correct configuration file or are left untouched in cases where no suitable configuration file is found.
- BZ#957802
- The rt-setup package mounted the
cpusetcgroup because the Realtime kernel only supported that cgroup. Since the most recent Realtime kernel now supports all the cgroups except the RT Group Scheduler, this custom cgroup mounting logic was removed from rt-setup, and a package dependency for libcgroup was added. All cgroups supported by the Realtime kernel are now mounted on boot by the libcgroup startup code. - BZ#956776
- The SLUB allocator in the kernel uses the deferred
kfree()processing by default. This caused latency spikes when thecpu_partiallogic was executed to reclaim the deferredkfree()blocks. Now, aslub_cpu_partial_offscript is executed at boot time. This script will traverse/sys/kernel/slaband write zeros into any 'cpu_partial' entry found, effectively turning off the SLUBcpu_partiallogic and thus remove the noted latency spikes.
3.8. RHSA-2013:0622 – Important: kernel-rt security and bug fix update
Security Fixes
- CVE-2013-0268, Important
- A flaw was found in the way file permission checks for the
/dev/cpu/[x]/msrfiles were performed in restricted root environments (for example, when using a capability-based security model). A local user with the ability to write to these files could use this flaw to escalate their privileges to kernel level, for example, by writing to theSYSENTER_EIP_MSRregister. - BZ#CVE-2013-0871, Important
- A race condition was found in the way the Linux kernel's ptrace implementation handled
PTRACE_SETREGSrequests when the debuggee was woken due to a SIGKILL signal instead of being stopped. A local, unprivileged user could use this flaw to escalate their privileges. - BZ#CVE-2013-1763, Important
- An out-of-bounds access flaw was found in the way
SOCK_DIAG_BY_FAMILYNetlink messages were processed in the Linux kernel. A local, unprivileged user could use this flaw to escalate their privileges. - BZ#CVE-2012-4542, Moderate
- It was found that the default SCSI command filter does not accommodate commands that overlap across device classes. A privileged guest user could potentially use this flaw to write arbitrary data to a LUN that is passed-through as read-only.
- BZ#CVE-2013-0290, Moderate
- A flaw was found in the way the __skb_recv_datagram() function in the Linux kernel processed payload-less socket buffers (skb) when the
MSG_PEEKoption was requested. A local, unprivileged user could use this flaw to cause a denial of service (infinite loop).
Bug Fixes
- BZ#858396
- There was high contention on run-queue lock when load balancing before idling, causing latency spikes on high CPU core count systems. With this update, IPI is used to send notification to cores with pending work, and the cores push the work rather than trying to pull it, resolving this issue.
- BZ#909965
- Previously, ACPI lock was converted to an rt_mutex, leading to a traceback when scheduling while atomic. With this update, ACPI lock has been converted back to a raw spinlock.
- BZ#912942
- Fibre Channel (FC)/iSCSI device state was set to off-line and after a timeout, not set back to running. Such a device would not come back online after a fast_io_fail or timeout. With this update, an explicit check for the device being offline has been added, and the device is set back to running when re-initializing, allowing devices to recover after a failure or timeout.
3.9. RHSA-2013:0566 – Important: kernel-rt security and bug fix update
Upgrade to latest stable Realtime kernel
Bug Fixes
- BZ#707262
The MRG Hewlett Packard Smart Array (HPSA) driver has been rebased to the latest upstream version (3.7) and updated with selected Red Hat Enterprise Linux 6 fixes and enhancements. The current MRG HPSA driver version is 2.0.2-4-RH1+MRG-RT.
- BZ#773017
Previously, the MRG Realtime *-devel packages contained two executable files:
scripts/conmakehashandscripts/pnmtologothat were never changed between kernel variants, and generate the same hash value for all variants. Consequently, debuginfo conflicts occurred on these files. Now, these files have been removed from the *-devel packages since they are not used to generate kernel modules, so there are no debuginfo conflicts.- BZ#880749
An upstream patch which introduced the rt_mutex construct had exported previous inline functions as
EXPORT_SYMBOL_GPL, which prevented third-party kernel modules from building. TheEXPORT_SYMBOL_GPLfunctions are now changed toEXPORT_SYMBOL, so third party modules can build successfully.- BZ#887767
The MRG Realtime kernel, which is newer than the Red Hat Enterprise Linux 6 kernel, has a "filesystem error nag" feature that gets cleared by newer user-space tools so that it is only printed once. This feature is unknown to Red Hat Enterprise Linux 6 filesystem or logging tools, so it never gets cleared. Consequently, the filesystem status is sent to the console and system log each day. The kernel function
print_daily_error_info()has been patched, and now returns without printing daily error message.- BZ#866600
Support has been added for IEEE 1588 Precision Time Protocol (PTP) to the MRG 2.3 kernel. PTP provides greater time synchronization than is offered by Network Time Protocol (NTP).
3.10. RHBA-2013:0563 – Red Hat Enterprise MRG Realtime 2.3 bug fix update
This update provides bug fixes for the following packages:
- libibverbs-rocee
- The libibverbs-rocee packages provide a library to enable userspace processes to use RDMA over Converged Ethernet (RoCE) "verbs" acording to the InfiniBand Architecture Specification and the RoCE Protocol Verbs Specification.
- libmlx4-rocee
- The libmlx4-rocee packages provide a device-specific driver for Mellanox ConnectX InfiniBand host channel adapters (HCAs) for the libibverbs-rocee library.
- rteval
- The rteval package contains a utility for measuring various aspects of real-time behavior on systems under load. The script unpacks the hackbench stress test and benchmark utility, and the kernel source code from the rteval-loads package. It then builds hackbench, and goes into a loop, running hackbench and compiling a kernel tree. During that loop, the cyclictest program is run to measure event response times. After the run time completes, a statistical analysis of the event response times is performed and printed to the screen.
- rt-firmware
- The rt-firmware package contains the contents of the latest linux-firmware tree from the kernel.org project for firmware blobs.
- tuna
- The tuna package provides graphical and command-line interfaces for changing scheduler and interrupt request (IRQ) settings. Changes can be made to CPUs, by thread or at the IRQ level, taking into account the topology of multi-socket and multi-core systems. Tuna provides the ability to isolate CPU cores and sockets for use by a specific application or hardware device.
Bug Fixes
- BZ#893133
- With this update, MRG Realtime provides updates to the latest rt-firmware package that includes new
bnx2xandcnicfirmware. - BZ#871117
- With this update, MRG Realtime provides updates to the latest versions of the libibverbs-rocee and libmlx4-rocee packages that provide high performance networking.
- BZ#601234
- The tuna graphical user interface now allows changing individual thread scheduling policy inside a thread group. Users can now modify the priority or affinity of individual threads.
- BZ#871598
- Tuna uses python-schedutils, which used to incorrectly raise a SystemError exception instead of a OSError exception when a specified PID did not exist. python-schedutils has since been corrected, and tuna was also modified to expect the new OSError exception.
3.11. RHBA-2012:1492 – Red Hat Enterprise MRG Realtime 2.2 bug fix update
This update provides bug fixes for the following packages:
- rt-setup
- The rt-setup package configures settings required by the Red Hat Enterprise Linux realtime environment, such as creating the realtime group, adding realtime user privileges to PAM (Pluggable Authentication Models), enabling the configuration of
kdumpin real time, and disablingirqbalanceby default. - rtctl
- The rtctl package contains a set of scripts, which are used to manipulate the scheduling priorities of groups of system threads.
Bug Fixes
- BZ#868329, BZ#868442, BZ#868446
- MRG Realtime was using the Red Hat Enterprise Linux kernel as the kdump/kexec kernel, but this caused problems due to version differences. With the MRG Realtime 2.x kernels, the
kexec-toolsfailed to create the initrd image for the Red Hat Enterprise Linux 6 kdump kernel. MRG Realtime now uses the Realtime kernel as the kdump/kexec kernel to eliminate the configuration issues between kernel versions. - BZ#878536
- Due to the way the
rtctlconfiguration file selection works, if there was not artgroupsconfiguration file for a specific kernel or for the general kernel version (for example a 3.2 kernel), the default configuration file was used. Using the default configuration file could send chrt error messages to the console and some threads could have unusual priorities. This update provides a per-kernelrtgroupsconfiguration file, which eliminates the chrt error messages and defines defaultkthreadpriorities for each kernel.
3.12. RHSA-2012:1491 – Important: kernel-rt security and bug fix update
Upgrade to latest stable Realtime kernel
Security Fixes
- CVE-2012-3520, Important
- A flaw was found in the way Netlink messages without
SCM_CREDENTIALS(used for authentication) data set were handled. When not explicitly set, the data was sent but with all values set to 0, including the process ID and user ID, causing the Netlink message to appear as if it were sent with root privileges. A local, unprivileged user could use this flaw to send spoofed Netlink messages to an application, potentially resulting in the application performing privileged operations if it relied onSCM_CREDENTIALSdata for the authentication of Netlink messages. - CVE-2012-4508, Important
- A race condition was found in the way asynchronous I/O and
fallocate()interacted when using the ext4 file system. A local, unprivileged user could use this flaw to expose random data from an extent whose data blocks have not yet been written, and thus contain data from a deleted file. - CVE-2012-2133, Moderate
- A use-after-free flaw was found in the Linux kernel's memory management subsystem in the way quota handling for huge pages was performed. A local, unprivileged user could use this flaw to cause a denial of service or escalate their privileges.
- CVE-2012-3511, Moderate
- A use-after-free flaw was found in the
madvise()system call implementation in the Linux kernel. A local, unprivileged user could use this flaw to cause a denial of service or escalate their privileges. - CVE-2012-4565, Moderate
- A divide-by-zero flaw was found in the TCP Illinois congestion control algorithm implementation in the Linux kernel. If the TCP Illinois congestion control algorithm was in use (the sysctl
net.ipv4.tcp_congestion_controlvariable was set to "illinois"), a local, unprivileged user could trigger this flaw and cause a denial of service. - CVE-2012-0957, Low
- An information leak flaw was found in the
uname()system call implementation in the Linux kernel. A local, unprivileged user could use this flaw to leak kernel stack memory to user-space by setting theUNAME26personality and then calling theuname()system call. - CVE-2012-3400, Low
- Buffer overflow flaws were found in the
udf_load_logicalvol()function in the Universal Disk Format (UDF) file system implementation in the Linux kernel. An attacker with physical access to a system could use these flaws to cause a denial of service or escalate their privileges. - CVE-2012-3430, Low
- A flaw was found in the way the
msg_namelenvariable in therds_recvmsg()function of the Linux kernel's Reliable Datagram Sockets (RDS) protocol implementation was initialized. A local, unprivileged user could use this flaw to leak kernel stack memory to user-space.
Bug Fixes
- BZ#843130
- The TCP stack for the MRG Realtime kernel was vulnerable to a blind-reset attack. Consequently, an off-path attacker could inject packets into a TCP stream. This update implements the RST and SYN components of RFC 5961, which prevents blind-reset and blind SYN attacks on the MRG Realtime kernel TCP stack.
- BZ#856243
- Nesting local and remote CPU locks caused a lockdep warning about potentially recursive locking dependencies. Now, only the remote cpu lock is acquired, which provides sufficient protection. The lockdep warning no longer occurs.
- BZ#853495
- An attempt to load the microcode module on an unsupported AMD CPU would silently fail, without quitting the microcode initialization. Consequently, multiple backtraces, as many as the amount of CPUs on that given system, would be printed to the console and boot logs. This update backports the solution available on newer kernels, which avoids loading the microcode module on unsupported AMD CPU families. The microcode module now loads cleanly without backtraces from the microcode module.
- BZ#859226
- A subtlety in the relation between preemption depth (
preempt_count) and the raw spinlocks in the PREEMPT_RT based kernels led to a context where printk buffers were never flushed to the console. Some time after boot, after a certain condition was triggered, the kernel-generated log could not reach the console and system log (rsyslog). That specific verification of thepreempt_countvariable was adapted to the reality of the Realtime kernel. Logs generated by the kernel are now successfully sent to both the console and the system log, according to the system configuration.
3.13. RHSA-2012:1282 – Moderate: kernel-rt security, bug fix, and enhancement update
Upgrade to an upstream version
Security Fix
- CVE-2012-4398, Moderate
- It was found that a deadlock could occur in the Out of Memory (OOM) killer. A process could trigger this deadlock by consuming a large amount of memory, and then causing request_module() to be called. A local, unprivileged user could use this flaw to cause a denial of service (excessive memory consumption).
Bug Fixes
- BZ#786083
- In the MRG 2.1 Realtime kernel, messages from iptables were not being successfully logged into syslog (rsyslog). With this update, this bug has been fixed and the iptables messages are now properly recorded in syslog.
- BZ#799385
- Previously, the
spin_trylock()function in the migrate timers code disabled preemption, which could cause a deadlock. The code has been changed to allow the lock to sleep in the described scenario and the data is now protected by disabling CPU migration, thus preventing this bug. - BZ#799386
- Previously, the futex proxy handler could take the
pi_lockfrom a task without disabling interrupts. If an interrupt came in while the lock was held, and that interrupt took the same lock, a deadlock occurred. With this update, interrupts has been disabled in the described scenario and the deadlock no longer occurs whenpi_lockis taken. - BZ#799389
- Prior to this update, locks for per-CPU data on MRG Realtime kernels could be reactived after a CPU comes online, but with an owner that had not released them. This caused various problems such as blocking the original owner of the lock. Locking in MRG Realtime kernels has been changed to use specific CPU masks and a lock is now released when the CPU is still offline, thus fixing this bug.
- BZ#799391
- When a task called the
wait_task_interactive()function, it waited on the task to change its state and could get the wrong result on MRG Realtime kernels. Thespin_locks()in the MTG Realtime kernel changes the real state of the task but keeps a mirror copy of the state that it resets to when the task unlocks the spin lock. Thewait_task_interactive()function has been modified to be now aware of the mirror copy used by MTG Realtime kernels as well, thus fixing this bug. - BZ#799399
- On MRG Realtime kernels, the
ftrace_dump()debug function expects a lock, but does not check if interrupts are disabled. If interrupts were disabled, the MRG Realtime kernel could still take the lock. Consequently theschedule()function could be called with the kernel in theatomiccontext, indicating a bug had occurred. With this update, the MRG Realtime kernel has been modified to check if interrupts are disabled before giving the lock, thus fixing this bug. - BZ#814689
- Previously, the
CONFIG_FIPSconfiguration flag was not enabled, causing thesshdaemon to report errors on boot. This update enables theCONFIG_FIPSflag and thesshdaemon now starts up without errors. - BZ#815937
- Previously, the firmware files for the RealTek RTL-8169 Gigabit Ethernet network card were moved from the main kernel upstream repository to the linux-firmware Git repository. This update adds the firmware files back to the MRG Realtime kernel and the network cards can now properly load them.
- BZ#822298
- Previously, the
CONFIG_COMPAT_VDSOconfiguration flag caused kernel VDSOs (Virtual Dynamically-linked Shared Objects) to be mapped at predictable addresses. To reduce the security risk,CONFIG_COMPAT_VDSOhas been disabled, which ensures kernel VDSOs are now mapped to random addresses.
Enhancements
- BZ#725799
- With this update, the MRG Realtime kernel is able to work as a diskless client.
- BZ#798423
- With this update, the InfiniBand core and InfiniBand drivers have been updated to more closely reflect updates available in Red Hat Enterprise Linux 6.3.
- BZ#825344
- With this update, MRG Realtime kernel provides the mrg-rt-release package, which maintains the
/etc/mrg-realtime-releasefile. This file contains the MRG release string, which indicates the major, minor, and errata releases of the MRG Realtime kernel.
3.14. RHEA-2012:1280 – Red Hat Enterprise MRG Realtime 2.2 enhancement update
Enhancements
- BZ#798426, BZ#798427
- With this update, MRG Realtime introduces the libibverbs-rocee and libmlx4-rocee packages to provide high performance networking. The libibverbs-rocee package provides a library to enable user space processes to use the RoCE (RDMA over Converged Ethernet) "verbs" API according to the InfiniBand Architecture Specification and the RoCE Protocol Verbs Specification. The libmlx4-rocee package provides a device-specific driver for Mellanox ConnectX InfiniBand host channel adapters (HCAs) for the
libibverbs-roceelibrary.
3.15. RHSA-2012:0670 – Important: kernel-rt security and bug fix update
Security Fixes
- CVE-2012-2123, Important
- When a set user ID (setuid) application is executed, certain personality flags for controlling the application's behavior are cleared (that is, a privileged application will not be affected by those flags). It was found that those flags were not cleared if the application was made privileged via file system capabilities. A local, unprivileged user could use this flaw to change the behavior of such applications, allowing them to bypass intended restrictions. Note that for default installations, no application shipped by Red Hat for Red Hat Enterprise MRG is made privileged via file system capabilities.
- CVE-2011-4086, Moderate
- A flaw was found in the way the Linux kernel's journal_unmap_buffer() function handled buffer head states. On systems that have an ext4 file system with a journal mounted, a local, unprivileged user could use this flaw to cause a denial of service.
Bug Fixes
- BZ#808271
- The CAP_SYS_ADMIN check was missing from the dmesg_restrict feature. Consequently, an unprivileged and jailed root user could bypass the dmesg_restrict protection. This update adds CAP_SYS_ADMIN to both dmesg_restrict and kptr_restrict, which only allows writing to dmesg_restrict when root has CAP_SYS_ADMIN.
- BZ#753230
- Previously, the _copy_from_pages() function, which is used to copy data from the temporary buffer to the user-passed buffer, was passed the wrong size parameter when copying data. Consequently, if the user provided a buffer greater than PAGE_SIZE, the getxattr() syscalls were handled incorrectly. This update fixes _copy_from_pages() to use the ACL length, which uses a correctly-sized buffer.
- BZ#813892
- Some older versions of hardware or their software could not recognize certain commands and would log messages for illegal or unsupported errors the driver could not properly handle. This bug has been fixed and no bogus error messages are now returned in the described scenario.
- BZ#818220
- Previously, the qla2x00_poll() function did the local_irq_save() call before calling qla24xx_intr_handler(), which had a spinlock. Since spinlocks are sleepable in the real-time kernel, it is not allowed to call them with interrupts disabled. This scenario produced error messages and could cause a system deadlock. With this update, the local_irq_save_nort(flags) function is used to save flags without disabling interrupts, which prevents potential deadlocks and removes the error messages.
3.16. RHBA-2012:0496 – Red Hat Enterprise MRG Realtime 2.1 kernel bug fix update
Bug Fixes
- BZ#675885
- Previously, while working with an open file, the
epollkernel code was calling poll operations of another open file without holding the file's lock. Consequently, a deadlock in the code could occur when using the system console. Now, an upstream patch to use theep_poll_nested()function has been provided and the system console now works correctly in the described scenario. - BZ#773746
- The
i7core_edaccode did multiple probes for the same hardware component. As a consequence, boot time error message about the code being unable to create a duplicate file in the/sys/devices/directory was returned. With this update, duplicate probe code has been removed and the error messages are no longer returned in the described scenario. - BZ#787331
- A previous Realtime work queue patch introduced a race conditions in the CPU hotplug code. Consequently, the system could become unresponsive in this code. This update reverts a few past updates to the work queue, which turned out to be unnecessary, and the hangs no longer occur in the described scenario.
- BZ#796297
- The
CONFIG_CC_OPTIMIZE_FOR_SIZEconfiguration parameter, which was previously introduced to MRG Realtime, caused thegcccompiler to produce code prone to cache line bouncing. As a result, the kernel performance was decreased. This update turns off this parameter, thus improving cache utilization and performance. - BZ#804119, BZ#756631
- Previously, configuration files in MRG Realtime and Red Hat Enterprise Linux 6 were not synchronized regarding modules and built-ins. Consequently, the
mkinitrddaemon failed to find code that was part of a module in Red Hat Enterprise Linux 6 but part of a built-in in MRG Realtime. Now, the configuration files have been synchronized between these two products andmkinitrdcorrectly finds and sets up kernel components such askdump.
3.17. RHBA-2012:0495 – Red Hat Enterprise MRG Realtime 2.1 bug fix update
This update provides bug fixes for the following packages:
- ibm-prtm
- The ibm-prtm package contains a utility and a start-up script for IBM BladeCenter systems. It enables the EDAC (Error Detection and Correction) driver and also turns off the System Management Interrupts (SMI) generation, which improves system response time to events.
- rteval
- The rteval package contains a utility for measuring various aspects of real-time behavior on systems under load. The script unpacks the
hackbenchstress test and benchmark utility, and the kernel source code from the rteval-loads package, buildshackbench, and then goes into a loop, runninghackbenchand compiling a kernel tree. During that loop, thecyclictestprogram is run to measure event response times. After the run time completes, a statistical analysis of the event response times is performed and printed to the screen. - rt-setup
- The rt-setup package configures settings required by the Red Hat Enterprise Linux real-time environment, such as creating the realtime group, adding realtime user privileges to PAM (Pluggable Authentication Models), enabling the configuration of
kdumpin real time, and disablingirqbalanceby default. - tuna
- The tuna package provides graphical and command-line interfaces for changing scheduler and interrupt request (IRQ) settings. Changes can be made to CPUs, by thread or at the IRQ level, taking into account the topology of multi-socket and multi-core systems. Tuna provides the ability to isolate CPU cores and sockets for use by a specific application or hardware device.
Bug Fixes
- BZ#574166
- Signal handlers were commented out during debugging session and not restored for release, which caused the early termination logic to fail. Now, the signal handling logic has been restored and the early termination logic works appropriately.
- BZ#773075
- The
cyclictesttest always ran with the--smpoption rather than--numaon appropriate systems because thenumanodeparameter was not correctly passed to the child process. This bug has been fixed andcyclictestnow uses the correct option when starting measurement threads. - BZ#767605
- SMI remediation was not working correctly, resulting in error messages at boot due to an incorrect patch applied to the base
ibm-prtmscript. This update provide an appropriate patch that fixes SMI remediation on IBM non-EFI systems and the error messages are no longer returned. - BZ#545539
- Tuna now contains a man page that is correctly displayed with the command
man tuna. - BZ#698481
- Some udev rules added by MRG Realtime were being called at a much higher frequency than anticipated, which could cause inordinately long boot times on systems with more than 15 cores. Now, child process creation has been removed from the Realtime udev rules and the boot times are shorter on such systems.
- BZ#791371
- Previously, the Realtime bandwidth limiting feature for
SCHED_FIFO/SCHED_RRthreads was disabled. Consequently, various issues occurred in some CPU-intensive applications that depend on this feature. Now, this feature has been enabled andSCHED_OTHERthreads get 5% of the CPU time on cores that are monopolized bySCHED_FIFO/SCHED_RRthreads.Note
Note that when upgrading from a previous installation of rt-setup,,/etc/sysctl.confmust be manually edited to remove thekernel.sched_rt_realtime*lines, which disable bandwidth limiting. - BZ#710503
- The ibm-prtm package installs a configuration file in the
/etc/modprobe.d/directory. Prior to this update, themodprobeutility displayed the following warning message:WARNING: All config files need .conf: /etc/modprobe.d/ibm-amd, it will be ignored in a future release.
With this update, the configuration file has been renamed toibm-amd.config, thus fixing this bug.
3.18. RHSA-2012:0333 – Important: kernel-rt security and bug fix update
Security Fixes
- CVE-2011-4127, Important
- SG_IO ioctl SCSI requests on partitions or LVM volumes could be passed to the underlying block device, allowing a privileged user to bypass restrictions and gain read and write access (and be able to issue other SCSI commands) to the entire block device.
- CVE-2012-0044, Important
- A local, unprivileged user could use an integer overflow flaw in
drm_mode_dirtyfb_ioctl()to cause a denial of service or escalate their privileges. - CVE-2011-2918, Moderate
- A local, unprivileged user could use a flaw in the Performance Events implementation to cause a denial of service.
- CVE-2012-0038, CVE-2011-4077, Moderate
- A local, unprivileged user could use flaws in the XFS file system implementation to cause a denial of service or escalate their privileges by mounting a specially-crafted disk.
- CVE-2011-4097, Moderate
- A local, unprivileged user could use a flaw in the Out of Memory (OOM) killer to monopolize memory, have their process skipped by the OOM killer, or cause other tasks to be terminated.
- CVE-2011-4110, Moderate
- A local, unprivileged user could use a flaw in the key management facility to cause a denial of service.
- CVE-2011-4131, Moderate
- A malicious Network File System version 4 (NFSv4) server could return a crafted reply to a GETACL request, causing a denial of service on the client.
- CVE-2011-4132, Moderate
- A local attacker could use a flaw in the Journaling Block Device (JBD) to crash the system by mounting a specially-crafted ext3 or ext4 disk.
- CVE-2012-0207, Moderate
- A flaw in
igmp_heard_query()could allow an attacker, who is able to send certain IGMP (Internet Group Management Protocol) packets to a target system, to cause a denial of service. - CVE-2012-0810, Moderate
- If lock contention during signal sending occurred when in a software interrupt handler that is using the per-CPU debug stack, the task could be scheduled out on the realtime kernel, possibly leading to debug stack corruption. A local, unprivileged user could use this flaw to cause a denial of service.
Bug Fixes
- BZ#784733
- When a sleeping task, waiting on a futex (fast userspace mutex), tried to get the
spin_lock(hb->lock)RT-mutex, if the owner of the futex released the lock, the sleeping task was put on a futex proxy lock. Consequently, the sleeping task was blocked on two locks and eventually terminated in theBUG_ON()function. With this update, theWAKEUP_INPROGRESSpseudo-lock has been added to be used as a proxy lock. This pseudo-lock tells the sleeping task that it is being woken up so that the task no longer tries to get the second lock. Now, the futex code works as expected and sleeping tasks no longer crash in the described scenario. - BZ#786145
- When the
CONFIG_CRYPTO_FIPSconfiguration option was disabled, some services such as sshd and ipsec, while working properly, returned warning messages regarding this missing option during start up. With this update,CONFIG_CRYPTO_FIPShas been enabled and no warning messages are now returned in the described scenario. - BZ#761420
- Previously, when a read operation on a loop device failed, the data successfully read from the device was not cleared and could eventually leak. This bug has been fixed and all data are now properly cleared in the described scenario.
- BZ#783570
- Due to an assembler-sourced object, the
perfutility (from the perf-rt package) for AMD64 and Intel 64 architectures contained an executable stack. This update adds the.note.GNU-stacksection definition to thebench/mem-memcpy-x86-64-asm.Scomponent ofperf, with all flags disabled, and perf no longer contains an executable stack, thus fixing this bug.
3.19. RHBA-2012:0044 – kernel-rt bug fix update
Bug Fixes
- BZ#725485
- The kernel-rt package has been upgraded to upstream version 3.0, which provides a number of bug fixes and enhancements over the previous version.
- BZ#749575
- Some applications use flawed versioning logic that cannot recognize new Linux kernel versions in the format of
3.x.y. As a workaround to this bug in external applications, the newuname26utility has been added to MRG Realtime 2.1. This utility activates the 2.6 personality kernel patch to transform data returned by theuname(2)system call to the format of2.6.40+[minor_release_number], and then executes the actual application. - BZ#708407
- The
recvmmsg()andsendmmsg()system calls were missing from the code and were previously unavailable. This update restores the code with the system calls. - BZ#663865
- Previously, the
/proc/kcorevirtual file could be read beyond the ELF (Executable and Linkable Format) header file information and a malicious root user could access these additional parts of the file. With this update,kcorecannot be read beyond its ELF header, thus fixing this bug. - BZ#679263
- The
%pKprintk format specifier was not added when printing the data from the/proc/kallsymsand/proc/ modulesinterfaces. This could cause kernel address leaks. With this update,%pKis properly used when returning data from the interfaces. - BZ#711488
- Prior to this update, the kernel and kernel-rt packages delivered the same set of kernel manual pages. Consequently, file conflicts occurred when both kernel-doc and kernel-rt-doc were being installed. This update adds the
rtsuffix to the files with kernel-rt-doc manual pages and the file conflicts no longer occur. - BZ#725028
- Previously, both the Red Hat Enterprise Linux kernel and the Red Hat Enterprise MRG Realtime kernel delivered the
/lib/firmware/WHENCEfile, which caused an installation conflict. With this update, this file has been moved to a versioned directory in the Realtime kernel, thus fixing this bug. - BZ#717905
- The
cred_alloc_blank()function called theabort_creds(new)function withnew->security == NULLandnew->magic == 0if thesecurity_cred_alloc_blank()function returned an error. As a result, theBUG()function was triggered if SELinux was enabled or if theCONFIG_DEBUG_CREDENTIALSproperty was active. With this update,new->magicis set before thesecurity_cred_alloc_blank()function is called andcred->securitywith the NULL value increds_are_invalid()andselinux_cred_free()functions are now handled gracefully. - BZ#679272
- Certain kernel static data areas and kernel modules have writable or executable memory areas. Prior to this update, malicious software could overwrite the data and potentially execute code in these areas. With this update, the RO (Read-Only) and NX (No eXecute) bits have been added to the memory areas to prevent such actions.
3.20. RHSA-2011:1253 – Important: kernel-rt security and bug fix update
Security Fixes
- CVE-2010-4526, CVE-2011-1770, Important
- Flaw in the SCTP and DCCP implementations could allow a remote attacker to cause a denial of service.
- CVE-2011-1494, CVE-2011-1495, Important
- Flaws in Management Module Support for Message Passing Technology (MPT) based controllers could allow a local, unprivileged user to cause a denial of service, an information leak, or escalate their privileges.
- CVE-2011-1745, CVE-2011-2022, CVE-2011-1746, Important
- Flaws in the
AGPGARTdriver and inagp_allocate_memory()could allow a local user to cause a denial of service or escalate their privileges. - CVE-2011-2491, CVE-2011-2695, Important
- Flaw in the client-side NLM implementation, and in the way the maximum file offset was handled for ext4 file systems, could allow a local, unprivileged user to cause a denial of service.
- CVE-2011-2497, Important
- Flaw in the Bluetooth implementation could allow a remote attacker to cause a denial of service or escalate their privileges.
- CVE-2011-2517, Important
- Flaws in the netlink-based wireless configuration interface could allow a local user, who has the
CAP_NET_ADMINcapability, to cause a denial of service or escalate their privileges. - CVE-2010-4243, CVE-2011-1593, CVE-2011-1598, CVE-2011-1748, CVE-2011-2213, CVE-2011-1090, CVE-2011-2183, CVE-2011-2484, CVE-2011-2496, Moderate
- Flaw in the
execve(),next_pidmap(),bcm_release(),raw_release(), andinet_diag_bc_audit()functions; in the interaction between the Linux kernel's method for allocating NFSv4 ACL data and the method by which it was freed; in the memory merging support (KSM); in the taskstats subsystem; and the way mapping expansions were handled, could allow a local, unprivileged user to cause a denial of service. - CVE-2011-1020, Moderate
- The proc file system implementation could allow a local, unprivileged user to obtain sensitive information, or possibly cause integrity issues.
- CVE-2011-1021, Moderate
- Local, privileged user could write arbitrary kernel memory via
/sys/kernel/debug/acpi/custom_method. This update disables this file. The debugfs file system must be mounted to exploit this issue. It is not mounted by default. - CVE-2011-1478, CVE-2011-1767, CVE-2011-1768, Moderate
- Flaws could possibly result in a denial of service when a packet is received.
- CVE-2011-1479, Moderate
- CVE-2010-4250 fix caused a regression, allowing a local, unprivileged user to cause a denial of service.
- CVE-2011-1576, Moderate
napi_reuse_skb()could be called on VLAN packets, allowing attacker on the local network to possibly trigger a denial of service.- CVE-2011-1160, CVE-2011-2492, Low
- Information leaks.
- CVE-2011-1577, CVE-2011-1776, Low
- Flaws in the EFI GUID Partition Table implementation could allow a local attacker to cause a denial of service.
- CVE-2011-1585, Low
- Local, unprivileged user could mount an existing CIFS share that requires authentication without knowing the password. This fix is a preventative measure: to exploit this flaw,
mount.cifsmust be setuid root, which it is not by default. - CVE-2011-2495, Low
/proc/<PID>/iocould be read by local, unprivileged users without further restrictions, allowing them to gather confidential information.
Known Issue
- BZ#736683
- Due to a typographical mistake in the code, the kernel-rt installation attempts to call "tardlink" instead of "hardlink" and the installation returns multiple "tardlink: command not found" errors if the hardlink package is installed. However, installation still completes successfully.
Bug Fixes
- BZ#710158
- The user-space iptables tools required the CONFIG_NF_CT_ACCT option. This update decouples CONFIG_NF_CT_ACCT from other Netfilter options, disables this option in the configuration files, and therefore the option is no longer available.
- BZ#728310
- Firmware files for the cxgb3 driver were older than the driver version. This update adds the appropriate firmware files and prevents the warning in the logs.
- BZ#728551
- Previously, it was necessary to issue the
uname -vcommand to acquire information about whether the running kernel is a Red Hat MRG Realtime kernel. This process could become costly and present a source of potential problems. With this update, the system creates a file in the /sys/ directory that is an in-memory file system when the kernel is Realtime and makes it possible to verify if the kernel has Realtime capabilities efficiently. - BZ#730834
- If you run the command
echo 0 > /sys/kernel/kexec_crash_sizeand kdump was not enabled, a kernel crash occurred and the system printed an oops message. This happened because the calledcrash_shrink_memory()function attempted to release thecrashk_resvariable. However, the variable is reserved only if kdump is enabled. With this update, thecrashk_resvariable is released only when kdump is enabled and the problem no longer occurs. - BZ#727699
- The push/pull algorithm for migrating Realtime tasks could have resulted in high lock contention and incorrect results. This update improves the algorithm with a faster sequential scan, which results in lower lock contention and better behavior under pressure.
- BZ#727549
- Non-Realtime kernels set the timer's current base to NULL when switching to a new base. Since this code is pre-emptible on Realtime, this code could loop forever in the
lock_timer_base()function. This update avoids setting the current base to NULL on Realtime kernels. - BZ#719742
- Some older be2net cards and firmware did not recognize certain commands and returned illegal or unsupported errors. Since newer drivers can handle this gracefully, this update removes the error messages.
- BZ#681987
- Previously the Realtime kernel could not use the crash tool because it had CONFIG_STRICT_DEVMEM set. This update enables the crash kernel module so that the crash tool works on Realtime kernels as expected.
3.21. RHEA-2011:0895 – Red Hat Enterprise MRG – Realtime 2.0 Release
Enhancements
- BZ#666951
- Due to changes in the
/sysvirtual file system layout, the Red Hat Enterprise Linux 6 user space assumes different kernel behavior. To allow various scripts and udev for Red Hat Enterprise Linux 6 to work properly with the Red Hat Enterprise MRG Realtime kernel, theCONFIG_SYSFS_DEPRECATED*configuration options in the Red Hat Enterprise MRG Realtime kernel have been turned off. - BZ#667440
- The startup scripts and services on Red Hat Enterprise Linux 6 assume new kernel behavior, which required several configuration setting changes in the Red Hat Enterprise MRG Realtime kernel. With this update, the
CGROUPandKSMconfiguration options were turned on, and theSYSFS_DEPRECATED_V2option was turned off. This allows the Red Hat Enterprise MRG Realtime kernel to boot properly in the Red Hat Enterprise Linux 6 user space. - BZ#705582
- When using the Red Hat Enterprise MRG Realtime kernel, application scaling was limited, because a single spinlock (
idr_lock) was used to access the POSIX timer structures. This update converts the spinlock to the read-copy-update (RCU) protocol, which allows parallel access to the POSIX timer structures and enhances scalability.
3.22. RHEA-2011:0894 – Red Hat Enterprise MRG – Realtime 2.0 Release
- The ibm-prtm package contains a utility and startup script for IBM BladeCenter systems. It enables an Error Detection and Correction (EDAC) driver, and also turns off the System Management Interrupts (SMI) generation, which improves system response time to events.
- The python-linux-procfs package enables the extraction of information from the
/procfile system. - The python-schedutils package provides an interface to the scheduler.
- The rtcheck package provides an application that tests the running system for Realtime capabilities. This program can be used by Realtime-enabled programs to determine if the system environment is suitable for them to run correctly.
- The rtctl package comprises a set of scripts, used to manipulate the scheduling priorities of groups of system threads.
- The rteval package contains a utility for measuring various aspects of Realtime behavior on systems under load. The script unpacks the
hackbenchand kernel source code from the rteval-loads package, buildshackbench, and then goes into a loop, runninghackbenchand compiling a kernel tree. During that loop, thecyclictestprogram is run to measure event response time. After the run time completes, a statistical analysis of the event response times is done and printed to the screen. - The rteval-loads package provides source code for system loads used by the rteval package.
- The rt-setup package configures settings required by the Red Hat Enterprise Linux Realtime environment, such as creating the Realtime group, adding Realtime user privileges to PAM (Pluggable Authentication Models), enabling the configuration of kdump in Realtime, and disabling
irqbalanceby default. - The rt-tests package includes a set of programs that test and measure various components of Realtime kernel behavior. This package measures timer, signal, and hardware latency then tests the functioning of priority-inheritance mutexes.
- The tuna packages include graphical and command line interfaces for changing scheduler and interrupt request (IRQ) settings. Changes can be made to CPUs, by thread or at the IRQ level, taking into account the topology of multi-socket and multi-core systems. Tuna gives the ability to isolate CPU cores and sockets for use by a specific application or hardware device.
- The python-numeric package is a Python module that provides support for numerical operations for Tuna.
Enhancements
- BZ#704325
- This update provides a build of these packages for Red Hat Enterprise MRG 2.0, layered on Red Hat Enterprise Linux 6.
- BZ#661896
- The description of the rtcheck package has been updated to provide a clearer overview of the package.
- BZ#666955
- On Red Hat Enterprise Linux 6, the kernel-rt-firmware package is compliant with the default firmware loading script. To retain compatibility with this system, this update removes both the firmware loading script and the related udev rule from the rt-setup package.
- BZ#676927
- Previously, the Linux scheduler bindings provided by the python-schedutils package were unable to set the CPU affinity on systems with more than 64 cores. With this update, the package has been modified to dynamically allocate the required space, allowing the CPU affinity to be successfully set on large core systems.
Chapter 4. MRG Realtime on Red Hat Enterprise Linux 5
4.1. RHBA-2011:1370 – Red Hat Enterprise MRG Realtime 1.3 bug fix update
Bug Fixes
- BZ#738787
- When a socket buffer was reused, the
skb->skb_iifoperation resulted in an invalid value and could cause the kernel to terminate unexpectedly after two or three executions of theifuporifdowncommands in a VLAN and bonding network environment. With this update,skb->skb_iifis now reset on reuse, thus allowing theifupandifdowncommands to work correctly. - BZ#733976
- During the
tx/mccpolling, thenapi_complete()function was incorrectly called before reaping thetxcompletions. Consequently, thetxcompletion processing was scheduled on another CPU concurrently, which sometimes resulted in a kernel panic. With this update, under load,napi_complete()is called after thetx/mcccompletion processing but before re-enabling interrupts, thus fixing this bug. - BZ#711198
- Previously, negative
si_codevalues were not allowed. As a consequence, theglibc aiokernel API issued EPERM errors and kernel warning messages were returned. Now, only the problematic SI_TKILL code is disallowed, and it allows theglibc aioimplementation to queue a signal with the SI_ASYNCIO si_code, thus fixing this bug. - BZ#743320
- The Red Hat Enterprise MRG Realtime kernel did not provide the tcp_delack_min tunable to reduce network latency as described in the Realtime Tuning Guide. With this update, this tunable is now available in the Realtime kernel for customers to use as recommended in the documentation.
- BZ#743639
- Prior to this update, the unsupported kernel configuration options, KVM and VIRTIO, were sometimes used accidentally and caused unexplainable non-deterministic behavior. This bug has been fixed, and these options have been explicitly turned off in the Red Hat Enterprise MRG Realtime kernel.
Chapter 5. MRG Grid on Red Hat Enterprise Linux 6
- 5.1. RHSA-2014:16649 – Moderate: Red Hat Enterprise MRG Grid 2.5 security, bug fix and enhancement update
- 5.2. RHSA-2013:1294 – Moderate: Red Hat Enterprise MRG Grid 2.4 security update
- 5.3. RHSA-2013:0565 – Red Hat Enterprise MRG Grid 2.3 security, bug fix and enhancement update
- 5.4. RHSA-2012:1281 – Moderate: Red Hat Enterprise MRG Grid 2.2 security, bug fix, and enhancement update
- 5.5. RHSA-2012:0099 – Moderate: MRG Grid security, bug fix and enhancement update
- 5.6. RHBA-2012:0046 – Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update
- 5.7. RHSA-2011:1250 – Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update
- 5.8. RHEA-2011:0891 – Red Hat Enterprise MRG – Grid 2.0 Release
5.1. RHSA-2014:16649 – Moderate: Red Hat Enterprise MRG Grid 2.5 security, bug fix and enhancement update
condor
- BZ#1078777
A Logic error in the de-construction of a map structure attempted to increment container iterators after they were erased. As a result,
condor_triggerdcould experience a seg-fault due to operating on invalidated iterators. With this update, the deconstruction logic was re-ordered to eliminate operations on invalidated container iterators. As a result, the condor_triggerd no longer experiences seg-faulting.- BZ#721137
Enhancements were made to
condor_config_valto optionally obtain the value of a configuration macro, as evaluated in the context of a daemon's classad. This capability allows the administrator or user to verify the exact configuration value seen by a daemon. As a result,condor_config_valnow accepts an optional flag-evaluate, that causes it to return the value of a configuration macro evaluated with respect to a daemon's classad.- BZ#724834
Previously the default grid pool configuration for the use of
condor_procddid not work properly with nodes running multiple schedulers and local universe jobs, due to a collision of address files incondor_procdinstances created by local universe starters. This made proper configuration for multi-scheduler environments difficult. To address this issue, the default value forUSE_PROCDwas set to false, except forMASTER.USE_PROCDwhich is defaulted to true. As a result, only the master daemon runs a procd, which is used by all other daemons on a node. This avoids problems with address-file collisions from multiple procd instances.- BZ#823611
The Grid UserLog writing code was optimized to improve its speed. This resolves any potential bottlenecks on the time to write user logs for large-scale operations on job queues in the scheduler. As a result, the new optimizations make large-scale job queue operations 30-50% faster, when user logs are being written across NFS.
- BZ#1019196
Previously the Grid build code did not apply RELRO or PIE to compilation and linking of binaries. The resulting executable and object code did not take advantage of these security features. To resolve these issues, the build code for Grid was updated to enable RELRO and PIE.
- BZ#616451
A new feature was added to advertise job I/O statistics on job ads in the scheduler. This is a useful input to various Grid pool policies, such as placing jobs on hold which are over-using I/O. Additionally, Grid administrators and users can query jobs residing in the scheduler queue and identify which jobs are using large amounts of I/O to identify performance bottlenecks. As a result, the following I/O statistics are now collected from the cgroups blkio controller, and advertised on a per-job basis to job classads in the scheduler: BlockReadKbytes - Kilobytes read BlockWriteKbytes - Kilobytes written BlockReads - Number of read operations BlockWrites - Number of write operations Additional versions of the above statistics are published with respect to a recent time window, where the duration of that window is configurable.
- BZ#1072932
Previously, the package build environment for Grid was configured to use Red Hat Enterprise Linux 6.3 for its build environment. This caused the Grid builds to fail to build on Red Hat Enterprise Linux 6.5. To address these issues, the build environment was reconfigured, so now Grid builds occur against Red Hat Enterprise Linux 6.5 as required.
- BZ#1055903
Previously some windows-related
.exeand.dllfiles were present in the lookaside source cache and being carried into the srpm file generated by the condor package spec. This violated standards that no pre-build executables should be present in an srpm. Modifications were made to the script that prepares the lookaside source file to filter the offending files out. As a result, the srpm generated from the spec file no longer contains any prebuilt executables and now conforms to the standard.- BZ#975989
Previously the accounting group allocations were not explicitly constrained to sum to
<=the configured quota of the parent group. Consequently, if an accounting group was allocated surplus quota on a previous negotiation cycle, its sibling groups could acquire their full configured quota. Therefore the total resources used by the groups could exceed the configured quota of their parent group. To address this issue, additional logic was added to constrain resource allocations so that child allocations cannot sum to the configured parent quota. As a result, resource allocations to child accounting groups will no longer sum to greater than the configured quota of the parent.- BZ#1019873
A bug existed in the cgroups freezer controller that could cause a kernel crash if it was used to freeze an already-suspended job. Certain sequences of operations on condor jobs, for example suspending a job and then putting a hold on that job, could cause a kernel crash. To address this issue, defensive logic was added to by-pass the application of the freezer controller if a job had previously received a SIGSTOP. The conditions that can cause a kernel crash are now avoided.
- BZ#1084016
Previously HTCondor on Windows could not be updated without removing the previous condor version. Trying to open the HTCondor installation wizard, while the previous version in currently installed, failed with the error message: "Another version of this product is already installed...". This issue was addressed so that HTCondor can be updated without uninstallation of the previous version.
condor-aviary
- BZ#835582
In a previous version, the QMF and Aviary scheduler plug-ins raised errors on detecting malformed jobs missing JobStatus attributes. Because these jobs caused plug-in errors, it could block the scheduler from starting. To address this issue, the QMF and Aviary plug-ins were altered to log an error but pass malformed jobs through to the scheduler. This enables the scheduler to start up properly.
condor-plumage
- BZ#903279
Previously the plumage view-server plugin did not correctly detect a null socket, caused by failure to contact the negotiator. Consequently, attempts to de-reference null socket caused a crash in the plugin. With this update, the plugin logic was fixed so that null sockets are detected and handled gracefully. Now plumage plugins no longer crash when the negotiator cannot be contacted.
cumin
- BZ#1058871
Previously cumin did not record actual traceback information when it experienced an internal failure. This made debugging and assessing causes of failure more difficult without traceback information. In this version, traceback was added to cumin's log output. As a result, future internal errors will leave traceback information to improve debugging and assessment.
- BZ#1044318
Password hashing was insufficiently strong in a previous version. this left passwords potentially vulnerable to theft. As a fix, improved salting was added, and password hash was changed to SHA512. This now improves the standards for strength in password hashing.
- BZ#959098
Grid performance page elements occupied too much real estate to fit on a 1024x768 screen. This resulted in a page not formatting correctly at that resolution. To address this issue, the size of page elements was reduced so that the page now correctly formats at 1024x768 resolution.
- BZ#911503
Grid performance page elements occupied too much real estate to fit on a 1024x768 screen. This resulted in a page not formatting correctly at that resolution. To address this issue, the width of negotiator throughput plot was reduced so that the page now correctly formats at 1024x768 resolution.
- BZ#1020191
In a previous version, Cumin used string-lexicographic sorting on job ids, instead of sorting by numeric values. This resulted in incorrect sorts, such as "10" < "2". The resolve this behaviour, the sorting was fixed to order by numeric values. Sorting now operates as expected, such as "2" < "10".
distribution
- BZ#1050762
Mongodb does not do object length checking by default. As a consequence a malicious user could submit malformed BSON objects and gain access to privileged memory. To resolve this issue, the mongodb package was updated so that 'objcheck = true' is configured by default. As a result, the default out-of-box configuration of mongodb is now safe from this attack.
Note
Because this is a change to the installed configuration file, if a user updates the package but has also altered/etc/mongodb.conf, then they will see the following: Warning:/etc/mongodb.confcreated as/etc/mongodb.conf.rpmnew.
vulnerability
- BZ#1049748
MongoDB was found to be affected by a memory over-read bug that can be used by an authenticated user (if applicable) to obtain raw MongoDB server process memory contents via incorrect BSON object length. This issue does not seem to cross a security boundary under most deployments, but for some it could, like differently-privileged MongoDB users, data already deleted from the DB yet staying in process memory, or/and metadata that is not normally retrievable.
- BZ#1044315
It was found that MRG Management Console (cumin) used the crypt(3) 12:50 DES-based hash function to hash passwords. DES-based hashing has known weaknesses that allow an attacker to more easily recover plain text passwords from hashes via brute-force guessing. An attacker able to compromise a cumin user database could potentially use this flaw to recover plain text passwords from the password hashes stored in that database. (CVE-2013-6445)
Note
In deployments where user account information is stored in a database managed by cumin, it is recommended that users change their passwords after this update is applied.
5.2. RHSA-2013:1294 – Moderate: Red Hat Enterprise MRG Grid 2.4 security update
condor
- BZ#610809
This update makes membership on "sudoer" style lists for hosts and users configurable via standard operating system user-group and netgroup concepts. This occurs in a way that does not require reconfiguration of HTCondor daemons to update, and also facilitates security auditing. It introduces support for netgroups on HTCondor
ALLOW_*andDENY_*access lists, as a method of defining groups of users and/or machines that are members of those lists. It also supports user-groups on the HTCondor schedulerQUEUE_SUPER_USERSlist as a method of defining groups of users which are members of that list.- BZ#794818
This release introduces support for configurable consumption policies on partitionable slots, such that the quantity of each resource asset (e.g. cpus, memory, disk) that is consumed by a match is determined by evaluating a configurable expression on the slot. Usually some function of the amount requested by the matching job. These consumption policies allow a partitionable slot to emulate different resource allocation behaviors depending on the use cases of the customer. They also enable the negotiator to make multiple matches against each partitionable slot per negotiation cycle, providing improved performance and the better use of resources. This results in possible behaviors such as the emulation of static slots, the support for sub-core job loads, or memory-centric allocation policies rather than legacy cpu-centric ones. Each execute node on an HTCondor pool can be configured with one or more consumption policies, allowing heterogeneous resource allocation strategies on a single pool.
- BZ#974605
Previously, an increase in symbols and code linked against the HTCondor shadow subsequently increased the memory requirements per shadow. This could limit the number of running jobs on very large HTCondor pools. With this fix, memory requirements for shadows were reduced due to changes to the shadow linking, as well as the pruning of unused symbols and data structures.
- BZ#972851
Previously, a change in the return value of system hostname lookup functions affected HTCondor's logic for expanding short hostnames. With this fix, HTCondor can now properly expand short into fully qualified hostnames.
- BZ#857974
This release introduces support for JobPrio scope across multiple schedulers. JobPrio was previously used to sort jobs only within an individual scheduler, preventing it from being used to support a job priority queue in pools where a single submitter may have jobs residing in multiple schedulers. With this feature, a job priority queue using JobPrio can be maintained even when a single submitter has jobs residing in multiple schedulers.
- BZ#876585
Previously, multiple attempts to retrieve information on child process subtrees would either succeed on the first try or fail repeatedly. This resulted in additional time wasted on unsuccessful attempts to retrieve information. With this fix, a configuration parameter allows the administrator to limit the number of retrieval attempts if it is determined that multiple attempts are not helping.
- BZ#903409
Previously, floating point precision errors caused dynamic quota values to sum to a value that is strictly greater than one, when the true mathematical sum would be exactly equal to 1. As a result, warning logic designed to inform the administrator of misconfiguration was generating spurious warnings. With this fix, the warning logic has been altered to only generate warnings for dynamic quota sums that exceed 1 by a significant amount, and where that amount is not triggered by numeric precision errors.
- BZ#904118
This release introduces the ability to output additional information on internal maxdelta and round robin iterations from the negotiator log. Making this information available, and particularly without having to enable debug-level logging, improves the ability of administrators to tune the behavior of the negotiator.
condor-aviary
- BZ#878141
Previously, pointers returned from
wsdl2cppfunctions, which were missing tests for handling NULL values. As a result, XML inputs causing NULL values to be returned would crash the HTCondor aviary server. With this fix, checks for NULL return values were inserted to prevent crashes from XML inputs.
cumin
- BZ#983134
Previously, Cumin did not limit the length of URLs that it processed. This caused Cumin to use extra resources when processing malformed and long URLs. With this fix, Cumin limits the length of URLs to a maximum of 2048 characters.
- BZ#980366
Previously, Cumin did not check for duplicate widget IDs in update requests. This resulted in malformed update requests listing the same widget multiple times, causing Cumin to use extra resources. With this fix, Cumin now ignores duplicate widget IDs in update requests.
- BZ#840521
A previous internal API change resulted in errors when moving messages between queues, failing with an exception trace. The corresponding Cumin code has been updated to match the modified API and restore message movement between queues.
- BZ#964237
This fix removes an unused graphical radio button element from the 'Slots' display.
- BZ#966548
Previously, tag creation in Cumin allowed blank tag names, causing
Create Tag: Failederrors. With this update, tag names are required for all new tag creation in Cumin.- BZ#964217
The Negotiator tab has been renamed from "Negotiators" to "Negotiator" to better reflect the design of Cumin to work with a single pool that contains a single Negotiator.
5.3. RHSA-2013:0565 – Red Hat Enterprise MRG Grid 2.3 security, bug fix and enhancement update
5.4. RHSA-2012:1281 – Moderate: Red Hat Enterprise MRG Grid 2.2 security, bug fix, and enhancement update
5.5. RHSA-2012:0099 – Moderate: MRG Grid security, bug fix and enhancement update
Security Fix
- CVE-2011-4930
- Multiple format string flaws were found in Condor. An authenticated Condor service user could use these flaws to prevent other jobs from being scheduled and executed or crash the
condor_schedddaemon.
Bug Fixes
- BZ#765846
- Previously, the job classad (classified advertisement) produced by the
cuminmanagement console was outdated and was still using theVMPARAM_Kvm_Diskparameter instead of theVMPARAM_vm_Diskparameter. Consequently, submitting a VM job from withincuminappeared to succeed incuminbut the actual job failed to start. Now,cuminhas been fixed to useVMPARAM_vm_Diskwhen building the job classad and VM jobs submitted from withincuminstart as expected. - BZ#751779
- The Scheduler object uses a custom view which in turn uses a customized stat set that was previously not subject to the new display format. As a consequence, the value displayed for up-time on the Scheduler object was using the old format which was not consistent with the
dd:hh:mmlabel that is next to the stat name. Now, the rendering logic of the Scheduler view object has been updated and the value displayed for up-time is now consistent with thedd:hh:mmformat indicated in the stat label. - BZ#782902
- When
cuminfetched more than 25 data entries from thesesamemessaging management tool to merge them with the data of thewallabyconfiguration service, some of thesesamedata disappeared for a given refresh interval. This happened when viewing the Inventory page with more than 25 systems that reportsesamedata. With this update,cumingets allsesamerecords before merging them with thewallabydata. Now,sesamedata no longer disappear from the Inventory page in the described scenario. - BZ#782485
- Due to an unhandled exception on the Inventory page, when
wallabydata was expected but not present, the exception dump was returned to the Inventory page. This bug has been fixed, the exception is now handled properly, and the Inventory page now displays without errors. - BZ#771642
- Following a host link from the Inventory page opens a summary page for the host. The content of the Configuration tab depends on data encoded in the URL which could become stale. Consequently, selecting the Configuration tab sometimes caused an exception when the host data in the URL was stale. Now, data in the URL is checked for validity before the Configuration tab is generated. Also, the user is notified and redirected back to the Inventory page if stale data is detected, and the exception no longer occurs in the described scenario.
- BZ#748735
- Following a host link from the Inventory page opens a summary page for the host with the Overview tab selected. A host could be visible in the Inventory page even when no
sesamedata was available for that host, but the Overview tab requiressesamedata. Consequently, an exception occurred on the Overview tab if the data fromsesamewas missing. Now, missing data fromsesameis checked when generating the Overview tab, the user is notified with a yellow banner when no data is available, and the exception no longer occurs in the described scenario. - BZ#765713
- Previously, the
#000,.1RGBA value was used for a transparent black background, which appeared gray when the data was plotted. Consequently, the 1-day view of the percent memory chart under the Inventory tab showed a gray artifact for allxvalues that contained some data. With this update, the#111,.1RGBA value (transparent white) is used in the described scenario and the graph no longer shows the gray artifact. - BZ#761165
- When the
condor_q -longcommand was run on a EC2 (Amazon Elastic Compute Cloud) job, theEC2AvailabilityZoneparameter was misspelled. This update fixes the spelling ofEC2AvailabilityZoneand this bug no longer occurs. - BZ#761671
- When a series of queries was run against the collector while the collector was already under heavy load, the collector blocked waiting on the query operation. With this update, queries have been set to be non-blocking and no longer halt the collector in the described scenario.
- BZ#757772
- Previously, the scheduler did not properly restore submitter names with accounting group information on restart. Consequently, incorrect submitter names were associated with the wrong accounting group by the
Negotiatordaemon. With this update, additional logic to restore the submitter names with accounting group information, when accounting group is detected, has been added. Now, submitter classads are properly reconstituted on the scheduler restart. - BZ#765902
- Previously, no binaries in Condor RPM packages were compiled with the
FORTIFY_SOURCEflag. As a consequence, stack smashing protections were not enabled by the compiler. With this update, Condor has been recompiled with the-D_FORTIFY_SOURCE=2option, thus fixing this bug. - BZ#758212
- When the
condor_holdutility was run with a%sign in the hold message when writing an XML-formatted user log, theschedddaemon terminated unexpectedly. With this update, thesprintf()calls have been fixed to not use direct input strings andscheddno longer crashes in the described scenario. - BZ#613931
- Addition of accounting groups to the
condor_userprioutility reporting added redundant values to total sums. Consequently, totals reported bycondor_userpriowere too high. With this update, new logic that prevents accumulation of redundant values from accounting groups has been added to the code andcondor_userpriono longer includes redundant values to its reports. - BZ#751072
- When
condortools were run with the-helpfrom the command line, they issued incorrect, non-zero return code. This bug has been fixed andcondortools now properly return0in the described scenario. - BZ#759154
- When the
sshd.shscript failed to execute properly, it returned0for all error conditions. This bug has been fixed andsshd.shnow displays the correct error codes when handling failure conditions. - BZ#759433
- When a job was run in OpenMPI or parallel universe environment, the
condor_chirputility failed to write the output file. With this update,condor_chirphas been fixed to use absolute paths, rather than relative paths, thus fixing this bug. - BZ#750063
- Previously, EC2 jobs executed via the
condor_q -runcommand failed to display theHOSTinformation. With this update, EC2 instance nodes have been added to theHOSTstring and thecondor_qutility now displays theHOSTinformation correctly in the described scenario. - BZ#752322
- Previously,
condor_qutility failed to display the summary information for suspended jobs. This bug has been fixed andcondor_qnow displays summary information for all job states. - BZ#761588
- When the
condor_scheddutility, managed by Red Hat High Availability, terminated unexpectedly and was restarted, the PID file contained the PID of the previous instance rather than the current one. Moreover, the PID file was not removed whencondor_scheddwas shut down. Now, the startup script has been improved to look for the PID file and remove it before the service starts, if necessary. Also, the stored PID now always corresponds to the currently running instance ofcondor_schedd. - BZ#751834
- Prior to this update, setting a region in EC2 Enhanced was not supported. Consequently, using EC2 Enhanced with AMIs in a region other than the user's default region caused jobs fail to start. With this update, a new configuration option (
EC2Region = "<region>") has been added and running jobs on AMIs in regions such asus-east-1andeu-west-1now succeeds. - BZ#753829
- Previously, the comparator for an internal collection that tracks active jobs in a submission was insufficient when used with DAG (Direct Acyclic Graph) workflow descriptions. Consequently, the DAG submissions were prematurely destroyed and recreated. As a result, when the QMF (Qpid Management Framework) plug-in of the
schedddaemon was used for job publishing, DAG submission job state totals were reported incorrectly and did not properly accumulate as the DAG submission progressed through it's node job execution. This bug has been fixed and DAG submission job state totals now increase, decrease and accumulate consistently as viewed by a QMF client. - BZ#754202
- When the history file configuration settings were set to very low values, the index file for the current history file could become orphaned. Subsequently, it's inode was quickly reused by the file system and incorrectly maintained in an active list by the
condor_job_serverandaviary_query_serverutilities. With this update, correct index files are regenerated and the orphaned ones removed from the current set of history backup files before the inode reuse could occur. Now, index files are correctly cleaned up after a history file rotation and point to valid archived history backups based on inodes. - BZ#773680
- Due to a regression in the
Scheduler::holdJobRaw()function, the Condor server failed to change the job state of a job that was held and later released. Consequently, the job appeared to be idle while it was in fact running. With this update, the order of calls outside of the QMF and Aviary has been changed so that a released job's state is now correctly reported in the described scenario. - BZ#760615
- Previously, when the
DedicatedResourceoption was added through the remote configuration feature, theSTARTD_ATTRSattribute was overwritten instead of appended to. With this update,STARTD_ATTRSproperly appends in the default database and addingDedicatedResourcethrough remote configuration as described now works as expected. - BZ#760611
- Previously, when the
DedicatedResourceoption was applied to a group or a node through the remote configuration feature, the affected nodes failed to start thecondor_startddaemon. With this update, a typo has been fixed inDedicatedResourceand nodes configured withDedicatedResourcethrough remote configuration now startcondor_startdas expected. - BZ#750264
- When the MRG Management Console was configured and then the remote configuration feature was used to configure the Aviary web service, the MRG Grid scheduler stopped reporting data to the Management Console. With this update, plug-ins outside the remote configuration feature are allowed to be configured. Now, installing remote configuration after manually configuring the Management Console does not result in the Console losing data from the MRG Scheduler.
- BZ#756402
- Previously, the SPQR library and the Wallaby service rejected authentication mechanisms other than
PLAIN,ANONYMOUS, orGSSAPI. However, the underlying QMF engine library supported additional mechanisms, which were unavailable to SPQR developers or Wallaby users. With this update, the SPQR library and the Wallaby service have been changed to not reject any valid SASL mechanism. Now, any SASL mechanism that is available to QMF and a configured Qpid broker is available to SPQR and Wallaby. - BZ#756401
- The SPQR library, used by the Wallaby service, is designed to automatically select an authentication mechanism based on certain selected options if no authentication mechanism is explicitly specified. However, if the user specified a username and the
ANONYMOUSmechanism, SPQR would override the user's explicit mechanism selection and instead choose thePLAINauthentication. This update removes this confusing behavior and SPQR now prioritizes explicitly-specified mechanisms over the implicitly-selected ones.
Enhancements
- BZ#745348
- Previously,
condor_statusonly allowed output sorting by simple attribute names. Users could not sort their classads using generalized expressions. The construction of the internal sorting expression has been modified so that it can refer to a generalized classad expression. Users can now provide general classad expressions to the-sortargument to sort the classads.
5.6. RHBA-2012:0046 – Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update
Bug Fixes
- BZ#741729
- Previously, the Cumin web console provided no facility to modify the schema of an existing database on a software upgrade. Consequently, if an upgrade included schema changes, users were required to drop and recreate the database to incorporate the new schema version. This resulted in loss of data. With this update, a new "cumin-admin upgrade-schema" command has been added that allows users to apply schema changes as part of a software upgrade. This command runs necessary scripts to upgrade the schema to the newest version without any loss of data. If there are no schema changes to apply, the user is informed and no action is taken.
- BZ#731065
- Prior to this update, when a job returned output, an error message, or a log entry containing a file name without full path, the Cumin web console failed to retrieve the file contents under the Output tab for a job. With this update, Cumin attempts to append the working directory from the job ad to the names of output files if the file does not contain a full path. When Cumin submits a job, it prepends the working directory to the output file names before submission. Now, Cumin handles the output file names correctly in most cases. However, if a job is submitted through another mechanism, such as the condor_submit utility, the output files are not given full paths, and the Aviary web service is used by Cumin for data retrieval, Cumin will still not be able to retrieve the output files.
- BZ#725038
- Previously, messages printed in yellow task status banners in the Cumin web console could potentially contain characters that break XML parsing in a browser during display. If such a message was printed, the browser displayed an error message, no Cumin content was visible, and Cumin had to be restarted to restore the user interface. With this update, code has been added to properly escape special characters in the banner messages before display, thus preventing this bug.
- BZ#718093
- Previously, the cumin package had a dependency on the dejavu-lgc-sans-fonts package, which is only available through an optional Red Hat Network channel. As a consequence, the Cumin web console could not be installed on Red Hat Enterprise Linux 6 using only the non-optional Red Hat Network channels. With this update, the dependency has been changed to the liberation-sans-fonts package and Cumin now installs correctly in the described scenario.
- BZ#733447
- Native httplib classes in Python provided support only for client certificate validation; server certificate validation was not supported. Consequently, the Cumin web console could not communicate securely over SSL with Aviary servers. With this update, Cumin has been enhanced to use either the Python SSL module or the M2Crypto SSL toolkit to supply server certificate validation. The cumin-web utility now writes log entries beginning with "AviaryOperations:", which indicates what type of communication is being used for Aviary.
- BZ#723848
- Previously, string values needed to be explicitly double-quoted by the user when editing a value on the form. When job attributes had been edited via the "Edit Attributes" form, Cumin did not distinguish string values from expression values automatically. Consequently, the value was interpreted by the Condor workload management system as an expression, which sometimes caused errors and prevented the attribute value from being changed. With this update, Cumin uses type information available from Condor to distinguish between strings and expressions and users are no longer required to explicitly quote string values when editing attributes.
- BZ#639383
- Previously, Cumin based the age of a submission around the creation times of the QMF object that represented the submission in the MRG Messaging space. However, displays such as the Longest Running Grid Submissions table in the default persona were affected by events in the MRG Messaging space, and could therefore be inaccurate. This update ensures that the data generated by condor and integrated into Cumin present the earliest queue date of any job included in a submission, with the result that the Longest Running Grid Submissions display is accurate. In addition, a new column which shows the queue date has been added to the table.
- BZ#690297
- During signal escalation it was possible for a starter to leak due to disjointed communications. This in turn could cause a claim to be unnecessarily relinquished due to out-of-order processing, such as if a shadow exited before the starter had been cleaned. This update forces the shadow to wait for the final update from the starter, which results in a claim reuse performance increase and increased job throughput during period_evaluation and signal escalation.
- BZ#699726
- The schedd daemon could spawn more shadows than it could handle given its current default configuration, which would in turn cause shadows to time out when trying to communicate with the schedd. With this update, the MAX_ACCEPTS_PER_CYCLE value has been increased, with the result that zero timeouts occurred under scale testing.
- BZ#748339
- Previously, when the Windows Java detection issued an exception, no Windows clients were visible via the "condor_status -java" command. Now, the default configuration for the JAVA_SEPARATOR module has been updated and Windows clients correctly detect Java details in the described scenario.
- BZ#734871
- Due to a bug in the state transition for overlapping Cron jobs, Cron object states were not marked as idle after forced termination. With this update, the state handling logic has been updated and the state transitions now work as expected.
- BZ#734213
- Previously, calls to the
param_without_default()function that did not free returned memory resulted in memory leaks in theNegotiatorgrid daemon. With this update, the calls have been replaced with new, memory-safeparam_defined()function, and the memory leaks no longer occur in the described scenario. - BZ#580270
- If a High Availability scheduler was running and the NFS service was disrupted, then the schedd lock disappeared, which resulted in two schedd processes running on two different nodes, but accessing the same job queue. Red Hat High Availability is now used to manage the job queue in an HA scheduler setup, with the result that a single schedd process runs on the nodes which comprise the HA scheduler setup.
- BZ#748825
- Load average detection is specific for each kernel major version and only supports kernels of version 1 or 2. Consequently, the load average would appear to be "-1" under a kernel 3.x. This update adds support for load average detection under kernels 3.x, implemented identically as for previous kernel versions, with the /proc/loadavg utility unchanged.
Note
Note that this bug is specific for Red Hat Enterprise Linux 6. Refer to Section 6.5, “RHBA-2012:0045 – Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update” to learn about updates specific for Red Hat Enterprise Linux 5. - BZ#732821
- Previously, the
Negotiatordaemon performed morefsync()operations than required for maintaining transactions. Consequently, operations involving large numbers of submitters resulted in a performance hit. With this update, the use offsync()calls has been minimized in transactions and performance ofNegotiatoron pools with large numbers of submitter records improved significantly. - BZ#712026
- When the startd daemon was gracefully stopped with some deferred jobs landed but not yet run, the jobs appeared to be running in the schedd daemon even though no starter was in operation. This update forces the starter to send job exit notification for deferred jobs and the jobs are now properly rescheduled and transitioned to the idle state, thus fixing this bug.
- BZ#732797
- Code analysis revealed sub-optimal configuration in the dedicated scheduler. Consequence of this was slower than expected performance of the scheduler. Now, the fsync algorithm for the dedicated scheduler has been updated and the performance of the scheduler increased.
- BZ#714724
- Prior to this update, the
Negotiatorcallback functions for the condor_userprio utility were limited to read only the first 63 characters of submitter names from commands. As a consequence, any attempt to invoke a condor_userprio command with a submitter name longer than 63 characters resulted in the name being truncated internally, causing the command to fail. Now, the callback functions have been updated to properly handle names of arbitrary length, thus fixing this bug. - BZ#739203
- Previously, when multiple instances of EC2 GAHP (Elastic Compute Cloud Grid ASCII Helper Protocol) clients were started, both instances leaked their file descriptor. This bug has been fixed and the ec2_gahp utility now scales properly.
- BZ#675697
- The check that determined whether a submitter record could safely be deleted from the
Negotiatorcontained a logic error. The logic checked for priority factor equal to default priority factorDEFAULT_PRIO_FACTOR, which allowed record deletion if the submitter's factor happened to be set to the default. The deletion-checking logic has been corrected in these updated packages so that any explicitly-set value for priority factor, default or otherwise, is now detected. As a result, submitter records are no longer improperly deleted if the user-set priority factor happens to be equal to theDEFAULT_PRIO_FACTORvalue. - BZ#738338
- When an executable in a job was unable to be executed, the starter shut down immediately without waiting for the exit job hook to complete its operation. With this update, the starter waits 30 seconds by default before giving up on the exit hook and shutting down. A job that fails to execute will now try to make sure a configured exit hook will complete its operation before exiting.
- BZ#565501
- The condor_status utility used a signed 32-bit integer to represent total disk space and memory available in a group of machines. This integer could have overflowed with a large pool, resulting in negative or nonsensical results when the "condor_status -total" command was run. This update changes the data type to an unsigned 64-bit integer value, which is significantly less likely to overflow.
- BZ#703992
- Even when job spool directories were disabled, the way that the schedd daemon attempts to clean them had a significant performance impact in certain environments, including those using a shared file system. This update improves the cleanup code so that schedd interacts only minimally in cases where no job spool directories exist, which reduces file system overhead when job spool directories are not enabled.
- BZ#715293
- When the condor_userprio utility detected failure to locate the
Negotiatordaemon, no error message was returned to inform users. With this update, an informative error message has been added to the failure detection code path and users are now properly informed in the described scenario. - BZ#725746
- Due to inaccurate help messages, users were led to believe the condor_router_rm utility can be passed both the user name and the job ID parameters at the same time, which is not possible. This update fixes the help messages to provide accurate information on condor_router_rm usage.
- BZ#725758
- When the condor_router_q utility was running on job routes whose name contained space characters, condor_router_q returned incorrect output and grouping. With this update, the awk parsing for the script that displays names has been fixed and output for routes whose names contain spaces is now displayed correctly.
- BZ#690494
- The
-longand-formatoptions to the condor_q command were order-dependent due to a parsing error. This update removes the parsing error, and the order of the aforementioned options no longer results in different behavior. - BZ#712111
- When a negative or floating-point number was entered into a
deferral_*parameter during submission, jobs were placed on hold as a result. With this update, these parameters are checked to contain a positive integer value before submission and now, only jobs with validdeferral_*parameters are allowed to run. - BZ#738719
- Previously, a statically allocated fault string was incorrectly freed in a WSO2 code path. When a wrong endpoint URL was invoked in the Aviary web service, the WSO2/Axis2C engine used by Aviary failed to load an implementation library and terminated the process unexpectedly. With this update, only a dynamically allocated string is freed in the same code path, the process containing Aviary no longer crashes in the described scenario, and Aviary clients now receive SOAP errors for wrong endpoint invocation.
- BZ#694612
- Previously, Aviary could only be operated within a secure local network in order to secure communication between Aviary clients and servers. This update includes upstream code enhancements that integrate OpenSSL more fully into the Axis2C SOAP (Simple Object Access Protocol) engine used by the Aviary server implementation. Aviary clients and servers can now exchange x509 certificates to authenticate each side and establish a secure link.
- BZ#705016
- The Aviary query server could have eventually terminate unexpectedly when the ATTR_SUBMISSION_NAME attribute on a job was modified through the Aviary API after it had been added to the job queue, then removed, followed by an invocation of getJobDetails. This update corrects the code so that ATTR_SUBMISSION_NAME is prevented from being modified after the job has been submitted, which prevents the possibility of an eventual Aviary query server crash.
- BZ#731463
- Previously, the getJobDetails implementation was returning integer attributes without evaluation. Consequently, calling the getJobDetails operation returned the CurrentTime attribute as "time()" string literal. This update fixes the getJobDetails implementation to return integer attributes with evaluation and the CurrentTime attribute now correctly appears as evaluated epoch integer.
- BZ#733055
- When the Aviary getJobSummary operation was invoked to return a job summary including the last_update field on Red Hat Enterprise Linux 64-bit implementation, the last_update field contained an invalid value. With this update, the signature and type conversions in the date encoder implementation have been modified and the last_update field now always contains a correctly formatted value.
- BZ#702060
- The wso2-wsf-cpp package unnecessarily depended on the wso2-rampart package. This update removes this dependency so that the wso2-rampart package is no longer needed for Aviary functionality.Note that the wso2-rampart package can be safely removed from the system.
- BZ#702489
- If a keyword was provided as an attribute name to an Aviary submitJob or setJobAttribute operation, the job submission was either incorrectly parsed or failed. With this update, submitting a job or setting an attribute that includes a keyword results in the operation failing with an error message indicating that a forbidden keyword caused the fault.
- BZ#749023
- The previous version of the Aviary web service interface generated many per-PID log files from the Axis2/C engine, causing unnecessary proliferation of files in the log directory. Now, Aviary logging setup uses a single log file each for the Aviary schedd plug-in and the query server that is continually appended to across-process restarts, and Aviary Axis2/C log file count is dramatically reduced.
- BZ#739205
- When attempting to retrieve the contents of a file using the Aviary
getJobData()function, a permission error occurred and the user was unable to finish the operation through the Aviary API even though it was possible to do so via QMF (Qpid Management Framework). With this update, thegetJobData()code has been modified to adjust file permissions earlier, and users can now retrieve files both through QMF and Aviary interfaces. - BZ#699737
- The condor_job_server utility maintains index files to history files. Although the history files were properly garbage-collected, the index files were not, which could have resulted in an error when trying to access details of a job that only existed in an index file. This update ensures that the index files are garbage collected along with the history files, thus preventing this error from occurring.
- BZ#733341
- In order to use Red Hat High Availability to manage the HAScheduler feature setup, manual modifications to the HAScheduler feature in the base-db were required. With this update, a new feature named "BaseHAScheduler" has been added to be base-db and no manual modifications to features in the base-db are now necessary in the described scenario.
- BZ#733481
- Previously, the configuration scheme of secure communications for the Aviary web service and the query server using remote configuration required users to add new parameters and features to the database manually. With this update, the SSLEnabledAviaryScheduler and SSLEnabledQueryServer features have been added to the base-db and configuration for secure communication in Aviary, and query server through remote configuration is now easily accomplished.
- BZ#724907
- The subsystem used by low-latency scheduling in MRG Grid's configuration changed from LOW-LATENCY to CAROD but the configuration in the base-db used by remote configuration still used the old subsystem. Now, the base-db has been updated to use the new subsystem and low-latency scheduling with remote configuration creates configurations with the new subsystem.
- BZ#733368
- Previously, the Wallaby configuration service shell commands did not provide full functionality when configuring the wallaby utility. Consequently, options not provided by the shell commands had to be configured via the condor_configure_store and condor_configure_pool utilities. This update adds many new wallaby shell commands and the Wallaby database configuration can now be performed through the wallaby shell.
- BZ#726108
- The remote configuration tools incorrectly split entity names on any commas in the name, and proceeded to treat those names as separate entities. This update corrects the parser so that it allows for names which include commas, with the result that the remote configuration tools now properly identify a name which contains a comma as long as the name is surrounded by quotes escaped with backslash characters, i.e. \".
- BZ#703593
- Manual pages for the condor_configure_store and condor_configure_pool commands have been added to the condor-wallaby-tools package.
- BZ#750315
- Previously, the sesame system agent authenticated to the qpidd broker as a guest user by default. In Red Hat Enterprise Linux 6, the guest SASL account is not present out-of-the-box in an installation of the "MRG Messaging" group. Consequently, sesame could not to connect to the broker without additional configuration. With this update, sesame uses anonymous authentication by default. Anonymous authentication is enabled by the qpidd broker by default on both Red Hat Enterprise Linux 5 and 6. Now, sesame is able to connect to the qpidd broker without additional configuration.
Enhancements
- BZ#707771
- With this update, two new options, "-grouporder" and "-grouprollup", have been added to the condor_userprio utility to allow accounting group hierarchies to be displayed and group statistics to be rolled up by hierarchy, respectively.
- BZ#705365
- This update adds several example scripts to the condor-aviary package for job submission, control, attribute set, queries for status, summary, and details. A directed acyclic graph (DAG) submission example has also been added. Users are now provided with a practical set of client examples for the Aviary web service interface written in Python and using the Suds client.
- BZ#668039
- The
unparse()function has been added to classads, which enables queries to be constrained using the value of a classad attribute (pre-evaluation). - BZ#679139
- The new remote configuration base database packages contain fixes and updates that must be loaded into wallaby in order to take effect. However, loading a database replaces the contents of wallaby, which could have caused user changes to be removed. These updated packages include a wallaby shell command that allows the database to be upgraded so that new versions of the remote configuration base database package can be applied to an existing system without removing a customized configuration.
- BZ#635012
- First class support has been added for the condor_suspend and condor_continue commands, which allows jobs to be suspended and continued based on userid or jobid, and without releasing the claim. This ability makes it easier to determine the shared resource impact of specific user or job IDs.
- BZ#610265
- A new attribute, "PreserveRelativeExecutable", has been added to job ads. The value of this attribute defaults to False. When set to True, and given that TransferExecutable is set to False, then an executable with a relative path name is left relative, and no paths are prepended to it. This functionality allows users to easily specify in a job submission that their executable remains relative until it is resolved, which allows executable locations to be resolved by the user's PATH when using a job wrapper.
- BZ#733379
- With this update, the Wallaby configuration service has been enhanced to use an alternate technique for storing versioned pool configuration. This technique significantly speeds up activation operations on most pools with complex configuration.
- BZ#725052
- RSA public key file was configured in an EC2 Enhanced route, no error message was given about the invalid public key, the job was routed to EC2 but was not able to run. With this update, an error message is now returned, and the job is not routed to EC2 in the described scenario.
- BZ#719052
- When a remote configuration tool referred to a store's default group by the name "Internal Default Group", attempts to use this name as a target to act on the store's default group failed. With this update, the remote configuration tools recognize "Internal Default Group" as a special target that corresponds to the store's default group.
- BZ#694861
- The condor_preen utility now logs information to TOOL_LOG to enable easier problem diagnosis.
5.7. RHSA-2011:1250 – Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update
5.8. RHEA-2011:0891 – Red Hat Enterprise MRG – Grid 2.0 Release
Chapter 6. MRG Grid on Red Hat Enterprise Linux 5
- 6.1. RHSA-2013:1295 – Moderate: Red Hat Enterprise MRG Grid 2.4 security update
- 6.2. RHSA-2013:0564 – Red Hat Enterprise MRG Grid 2.3 security, bug fix and enhancement update
- 6.3. RHSA-2012:1278 – Moderate: Red Hat Enterprise MRG Grid 2.2 security, bug fix, and enhancement update
- 6.4. RHSA-2012:0100 – Moderate: MRG Grid security, bug fix and enhancement update
- 6.5. RHBA-2012:0045 – Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update
- 6.6. RHSA-2011:1249 – Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update
- 6.7. RHEA-2011:0889 – Red Hat Enterprise MRG – Grid 2.0 Release
6.1. RHSA-2013:1295 – Moderate: Red Hat Enterprise MRG Grid 2.4 security update
6.2. RHSA-2013:0564 – Red Hat Enterprise MRG Grid 2.3 security, bug fix and enhancement update
condor
- BZ#850392
- To allow for accurate matching and reuse of partitionable slots, Condor's default settings for job submission and execute resources have been updated. These enhancements enable more accurate memory tracking of jobs and higher resource utilization.
- BZ#862550
- Previously, when users tried to suspend local universe jobs, the condor_schedd daemon would crash. This issue has been fixed with the ability to ignore requests and continue scheduler and local universe jobs. As a result, the condor_schedd daemon continues to work normally and reports an error to the user when they try to suspend a local universe job.
- BZ#525237
- This enhancement includes support for tracking process tree resource usage accounting via Control Groups (or cgroups) in the cpuacct, memory, and blkio controllers, as well as support for limiting memory usage via Control Groups.
- BZ#772009
- Previously, changing the order of RequestMemory lines inside a requirements expression resulted in a different requirements expression. This fix ensures Condor correctly parses the RequestMemory lines regardless of order, and as a result, the output is the same as the original requirements expression.
- BZ#772587
- Previously, when running an openmpi job with Condor's openmpiscript, STDERR would show the script as using deprecated parameters. With this fix, the openmpiscript no longer uses the deprecated parameters and STDERR remains empty when the system runs normally.
- BZ#782132
- Previously, openmpi jobs that were run with Condor openmpiscript on Red Hat Enterprise Linux 6.2 would fail attempting to load shared libraries. This fix passes the --prefix of the install location of the openmpi to mpirun, resulting in jobs that complete without error.
- BZ#784105
- Previously, when debugging and analyzing the job_queue.log, rollovers would not denote time reference. This fix adds a timestamp to the rollover log name. As a result, users can obtain time and date information from old job_queue.log files.
- BZ#825365
- Previously, a missing flag reset and inefficient key checking in the accountant initialization and accountant table loops allowed accountant table sanity checking only after negotiator reconfiguration. This was redundant and made sanity checking inefficient. This fix inserts a "once-only" flag reset during initial startup sanity checking, and improves the efficiency of key checks for iterations over the accountant table. As a result, sanity checking of the accountant table is faster, and only takes place on negotiator startup.
- BZ#809732
- Previously, when a parallel universe job is removed its claim can be reused, causing the job status to become invalid. This fix updates the accuracy of job status reporting and prevents this error.
- BZ#803897
- Previously, users were unable to configure job preemption policies that are aware of whether a job negotiated via autoregroup, and also what particular group name a job negotiated under. This happened because negotiating job and resource ads did not provide sufficient information to allow group-aware or negotiation-aware preemption policies. This update enhances the negotiator, schedd and startd claiming processes with new fields that include RemoteAutoregroup, RemoteNegotiatingGroup and RemoteGroup, and their Submitter counterparts. As a result, new job and resource attributes are available for use in defining PREEMPTION_REQUIREMENTS expressions that are group-aware and negotiation-aware.
- BZ#748053
- Previously, group quota limit bookkeeping did not take the possibility of preemption into account, which resulted in job preemption was prevented when group quotas were in effect. This fix extends bookkeeping logic for submitter limits to include possibility of preemption. Job classads in the negotiator were enhanced to allow PREEMPTION_REQUIREMENTS to include accounting group names in the expression. As a result, preemption is now allowed when group quotas are in effect and preemption policies can be configured to allow preemption to respect accounting group boundaries, or ignore them if desired.
- BZ#805448
- Previously enhanced logic for computing submitter limits did not take the NEGOTIATOR_CONSIDER_PREEMPTION setting into account. Setting NEGOTIATOR_CONSIDER_PREEMPTION to false caused submitter limits to be not as tight as possible, which resulted in inefficiency. This fix updates the logic to take consider-preemption settings into account, resulting in tighter submitter limits when the consider-preemption setting is off.
- BZ#740774
- Previously, submitting a job with RequestMemory equal to a floating point number caused the job not to run. This fix updates classads to cast the evaluation of RequestMemory to be equal to an integer. As a result, jobs run successfully.
- BZ#805581
- Previously, log messages for negotiation rejections were misleading and did not properly reflect the negotiation logic in the code. 'Group quota exceeded' was output as a rejection message instead of the correct 'submitter limit exceeded'. This was also output when the submitter limit was allowed to be exceeded. The resulting message was corrected to 'submitter limit exceeded' and logic was updated to not output this message when limits were actually allowed to be exceeded. As a result, the log messages are accurate.
- BZ#785289
- Previously it was not possible to configure an autoregrouping accounting group policy in the initial implementation of Hierarchical Group Quotas (HGQ). This enhancement changes the semantics of GROUP_AUTOREGROUP and GROUP_AUTOREGROUP_<name> to emulate the behavior of autoregroup prior to the introduction of HGQ. As a result, GROUP_AUTOREGROUP can now be used to configure a legacy-style autoregrouping policy.
- BZ#785283
- The previous hard-coded ordering of accounting group negotiation prevented users from obtaining the job matchmaking behavior required. This feature allows for configurable group negotiation order, which enables users to use starvation ordering, which allows groups with unfilled quota to negotiate first, as well as enforcing that the root "<none>" group negotiate last, to properly support the autoregroup feature. Additionally, this feature allows a Grid administrator to define custom ordering policies to meet customer-specific needs.
- BZ#833095
- Previously, local resource allocations were not internally copied to slot-total data structure during dynamic slot instantiation. This caused values for local resource attribute TotalSlotXxx to be reported as zeros on the classad. This fix provides proper data structure copy internally, and as a result, values for TotalSlotXxx are now reported consistently with standard TotalSlotCpus, TotalSlotMemory, and similar.
- BZ#783267
- When debugging a job while it was running on a target machine, users could not directly attach a debugger to a running executable. This fix adds support to allow users to SSH to the job sandbox directory of the running job. As a result, users can now SSH to their jobs to debug them on the target machine.
- BZ#845567
- This enhancement adds the PRE_SKIP keyword to the DAG manager. If a PRE script exits with the PRE_SKIP value in a DAGMan job, the node succeeds and the job and the POST script are both skipped.
- BZ#755765
- Previously, the dprintf configuration would read MAX_<SUBSYS>_LOG as an integer literal to bypass the classad expression limitation of 32-bit integers. Because of this, configuring MAX_<SUBSYS>_LOG with an expression, such as "1024*1024", caused a call to EXCEPT(), which was intercepted by parent condor_master process and made failure diagnosis difficult. This fix replaces the call to EXCEPT() with a call to _condor_dprintf_exit(), which leaves an informative message in the dprintf_failure.<SUBSYS> file in the log directory. As a result, configuration failures for MAX_<SUBSYS>_LOG are now more informative.
- BZ#850555
- This enhancement adds the -expand option to condor_config_val. When both -dump and -expand options are specified to 'condor_config_val', all configuration variables are expanded before they are printed out.
- BZ#850838
- This enhancement allows the DAG manager to copy PRIORITY values from the DAG input file to the JobPrio attribute in the job ClassAd. Furthermore, the PRIORITY values are propagated to child nodes and SUBDAGs.
- BZ#486480
- Previously, when the condor master daemon attempted to send an obituary during a log rollover event of the failed daemon, the obituary was not sent. The logic has been updated so that the master will send an obituary email when a daemon fails during a log rollover.
- BZ#766612
- Previously, when removing the binary on a node running the condor_schedd daemon in a High Availability configuration with Red Hat High Availability, the condor_schedd would not stop. This is caused when the binary was missing and the service would not failover. This is caused by the init script used to control the schedd, which does not check for the existence of the binary before stopping. As a result, a stop operation will be attempted even if the schedd binary isn't on the system. With this fix, the stop function works correctly.
- BZ#850567
- This enhancement improves the output of condor_userprio by adding the group_quota and group_priority info to condor_userprio -all.
- BZ#767613
- Previously, it was difficult to determine the number of active query workers during any one interval due to diagnosing collector performance issues. As a result, it was difficult to determine the number of active query workers during any one interval and to log the number of query workers once a query has been activated. This fix ensures every query will log the number of active query workers.
- BZ#788452
- Previously, when upgrading a windows condor install from 7.6.3-0.3 to 7.6.5-0.11, Condor's Java detection was broken from the previous install. This fix ensures that during upgrade, the installer will correct the previous version's configuration file. As a result, Condor's Java detection works as expected.
- BZ#809551
- Previously, users were unable to use a specific key in Condor which has already been generated via ec2-tools. This fix allows the user to specify a key by name.
- BZ#502675
- Previously, when running a vm-job, the vm-load was not accounted for in the loadavg calculations in the startd. This fix allows the user to obtain the cpu utilization from the vm-job and use it in the loadavg calculation. As a result, the loadavg calculation takes vm-load into account.
- BZ#751013
- Previously, when continuing a job after it had been suspended, the job would receive SIGCONT twice. This fix updates the logic to ensure that SIGCONT can only be sent once. As a result, the job which continues after it has been suspended will only receive SIGCONT once.
- BZ#756096
- Previously, defaults caused machines to wake up unnecessarily, and all sleeping machines were woken up after each ROOSTER_INTERVAL. This fix updates the default to wake up machines only if they have been matched within the last half hour. As a result, machines are only woken up if they have a recent match.
- BZ#773434
- Previously, commands run with the help switch
-hreturned non-zero on exit. This fix allows commands to return zero when the-hswitch is present. - BZ#846955
- Repeated calls to 'service condor restart' resulted in the error: /var/run/condor/condor_master.pid: No such file or directory. This fix updates the script process order so there is no error on restart.
- BZ#860308
- Previously, statically configuring /etc/hosts with ipv6 entries caused the condor_schedd daemon to crash trying to forward resolve entries. This fix ensures ipv6 addresses are properly handled in /etc/hosts and static configurations. As a result, Condor starts normally.
- BZ#864637
- Previously, issuing a condor_restart -subsystem condor_had daemon for a HACM node running the negotiator caused the condor_had and negotiator to stop. The had daemon would not restart. This fix ensures the had daemon restarts when this occurs.
- BZ#782553
- Previously, users could not perform queries against various resources in Condor ec2 jobs. This fix adds support for resource tagging. As a result, during job submission users can now tag their resources and run queries against those tags.
- BZ#782552
- Previously, performing an idempotent operation more than once yielded the same result as applying it just once. This fix adds support inside of the ec2 protocol when calling RunInstances. As a result, ec2_gahp now supports idempotent RunInstances.
condor-aviary
- BZ#807398
- Previously, when the Aviary Schedd plug-in and Query Server are deployed in a HA group, Aviary clients experienced stale endpoint references for a longer duration than necessary. Adjustments in the Locator implementation to quickly replace a failed endpoint reference with its new one. As a result, an Aviary client using a Schedd or Query Server endpoint can retrieve the new endpoint faster.
- BZ#733515
- Previously, Cumin needed a way to locate Aviary SOAP endpoints. Cumin was unable to find Aviary SOAP endpoints that may have been activated with ephemeral ports. WSDL, XSD and code were designed and developed to provide a SOAP interface to locate other SOAP endpoints. As a result, Cumin can retrieve Aviary SOAP endpoints through a well-known point of contact.
- BZ#768328
- Previously, Aviary XML schema for the various job states: there are no states listed for suspended and transferring_output jobs in the JobStatusType. The Aviary XML schema was updated to add these new job states, and implementation changes were made to reflect these totals in the API as well. As a result, Aviary clients can view the totals for suspended and transferring_output job states from this XML schema type.
- BZ#864560
- Previously, when a user submitted a job to Aviary using condor_submit with a submission name that is not bounded by quotes, an extra, bogus submission would be recorded when there should be only one. This fix includes an internal change to perform comprehensive checks for the presence of a Submission attribute within the job ad. As a result, a quoted or unquoted Submission attribute in a condor_submit is accurately reflected by Aviary as part of a single distinct submission record.
- BZ#856646
- Submission IDs were previously retrieved with a BEFORE mode, which led to duplicate submission IDs being returned. The problems that prevented the getSubmissionID implementation from processing the specified time range query parameters have now been fixed, so unique submission IDs are returned to the client.
- BZ#768319
- Invoking the Aviary query operation getSubmissionSummary on a submission with suspended jobs previously prevented totals for those suspended jobs being listed in the summary. The Aviary XSD has now been updated and implementation code has been added so that all submission totals are now listed in the summary.
- BZ#800079
- This release of Grid introduces a new capability in the Aviary package for collecting submission ids from a Query Server using a page size and offset. This feature was designed for two purposes: 1) help Aviary clients such as Cumin to manage large submission sets at increasing scale 2) enable an Aviary client to query for new submissions from a point-in-time (i.e., from the latest submission it knows about) In QMF, new submission objects become available in the QMF object space as they are created. Aviary and its use of SOAP requires a pull model to gather information about these new submissions. Also, if Cumin is brought online in an existing Grid deployment but with an empty database for submissions, it may need a way to incrementally load older submissions using a background thread. Once a submission ID is collected, the details of that submission can be retrieved by the Aviary getSubmissionSummary operation. The BEFORE/AFTER mode is optional, so results will instead be returned in lexical order.
- BZ#813807
- Submitting a job through Aviary or the QMF interface previously caused the Qdate field to show a value of zero (0). Submission and job implementation code in Aviary and QMF have now been adjusted to ensure that any ATTR_Q_DATE variable set that arrives late is properly recorded. This guarantees that the Qdate field shows the correct time value.
- BZ#855449
- Retrieving submission IDs with a non-exact qdate offset in AFTER mode previously led to submission IDs not being returned even though getSubmissionSummary reports that they do exist for the specified time range. The problems that prevented the getSubmissionID implementation from processing the specified time range query parameters have been fixed, so the correct range of submission IDs are now returned to the client.
- BZ#733498
- Exposing new suspend/continue capabilities in the Grid job lifecycle management core through a public API was not possible in earlier releases. New WSDL operations have been defined and implemented so users can now remotely suspend and continue jobs using the Aviary scheduler SOAP interface.
- BZ#886448
- Providing a timeout value from the command line for any of the Aviary sample python scripts previously generated a Backtrace indicating that the timeout value applied to the internal Suds client was not an integer type. The utility function has been modified to cast the input value for a timeout to a valid integer type or to catch an exception and output error message, so supplied timeout values are now correctly parsed and applied.
- BZ#732388
- Queries to retrieve submissions by a single owner name have previously been unsuccessful because submissions would instead be returned for multiple owners. A remote Aviary client requesting submissions for a single owner is now recognised so the appropriate values are collected and returned. Queries to retrieve submissions by a single owner name are now fulfilled as expected.
- BZ#800344
- When submitting a job through the Aviary interface and providing an extra attribute (such as "args") that is assumed to be basic and should not be overridden according to the allowOverrides XML attribute, the override prevention used to fail if the case differed from what is defined internally for ATTR_<name>. The code now enables case-insensitive comparisons of the basic attributes that may be overridden, so basic attributes are correctly preserved if allowOverrides is false.
- BZ#739219
- When the Aviary getJobData operation ran a query for a job output file using the relative filename that may have been defined in a submit file, the file could not be found. The implementation will now try the supplied name first. If that fails, it will also try prepending the IWD value to the first value and retry a stat of the file. This ensures that job output files submitted with relative paths can be implicitly resolved in the getJobData operation.
condor-cluster-resource-agent
- BZ#833343
- Trying to add a JobServer to a RHHA-controlled schedd that did not exist did not previously raise an error message to indicate a failure. An error message is now printed if the schedd configuration doesn't exist, so the user is notified if the add process fails.
- BZ#810982
- Aviary locator support for RHHA-managed query servers has been added for this release. Aviary clients need to know which machine the query server is running on, because query servers configured to be managed by RHHA will change machines when the schedd does. The tools for configuring such query servers now provide a service to locate them.
- BZ#828983
- When used with RHHA, the condor resource agent did not previously verify that a daemon had started during a start operation. The start operation could then report success even though the daemon had not begun to start. To address this, the resource agent now waits 10 seconds to check that the process starts. So when a start operation reports success, the demon is guaranteed to have started.
- BZ#833611
- When the wallaby cluster-* commands have been used to only perform actions in the configuration store, the user has been prompted for the ricci user password. The user will now be prompted for the ricci user password only if the action will be performed on the cluster configuration, so merely making changes to the store will not require ricci authentication.
condor-plumage
- BZ#840076
- mongodb will now collect and store its job history and a standalone python client has been added for querying this history. This does not replace the existing history file infrastructure or tools like condor_history. Instead, it moves job history onto a more robust, scalable, and manageable backend data source for enterprise deployments.
- BZ#801447
- Cumin needs utilization data for reporting, but has been unable to fulfil the basic reporting feature. Records, fields, and indices are now included in the Plumage ODS plug-in data emission to mongodb, so Cumin can retrieve utilization reporting fields from mongodb as needed.
- BZ#786815
- 'From' and 'To' date ranges have not been supported for the userlist, usergrouplist, and resourcelist options on plumage_stats, so a user was unable to limit their queries based on a date range. The 'From' and 'To' date range parameters are now passed to the appropriate functions within plumage_stats, so a user can query user, usergroup, and resource statistics within a specified time span.
- BZ#786825
- Server hostname and port arguments were not previously parsed correctly, which prompted a TypeError from python when plumage_stats was invoked with a server:port string. The unnecessary 'nargs' keyword is no longer in the parser option for a mongodb host, so plumage_stats now parses optional server arguments correctly.
condor-qmf
- BZ#867989
- Upstream changes in HTCondor previously modified the names of various condor_schedd daemon ClassAd statistical attributes, which caused statistics for a QMF scheduler object to show 0 values for attributes that should be non-zero. The implementation of the QMF schedd plug-in now implicitly maps from the old attribute names (7.6 series) to those renamed in the 7.8 series, so statistics for a QMF scheduler object show correct values for attributes as appropriate.
- BZ#883794
- The transition from 7.6 schedd stats to their 7.8 equivalents previously caused the cumulative job total to be incorrect. An incorrect variable assignment in the mapping code has now been fixed so the cumulative job total will be accurate and correct.
- BZ#753822
- The default configuration of the QMF Job Server used by cumin has been changed. The previous default was for cumin to use an embedded QMF Job Server object managed within a schedd plug-in component. Now, the default configuration is for a standalone daemon (condor_job_server) to be launched and for submission publishing within the plug-in's embedded Job Server to be disabled. Grid users using the previous default configuration should note that details will now be available in the new default for jobs that have been completed or removed from the queue.
condor-vm-gahp
- BZ#750818
- When running VM Universe jobs on RHEL5, the VM-Gahp previously attempted to update the utime for the image without the correct permissions. The logic to update utime for the image has been removed from inside of the VM-Gahp because this is the hypervisor's responsibility, so the condor_vm-gahp should no longer trigger a SELinux error.
- BZ#782054
- Virtual images prepared with VM Universe that had the VNC console enabled would not start because the VNC console was reported as missing. A submission parameter has now been added to enable VNC settings, so images with the VNC console enabled now run correctly under condor.
condor-wallaby-base-db
- BZ#800660
- The cemote configuration feature was previously unable to configure the Aviary locator feature. The base database for Aviary now has new features that allow this configuration.
- BZ#831756
- When the ExecuteNode feature did not have ALLOW_NEGOTIATOR set, nodes could not allow matching. The ExecuteNode now includes ALLOW_NEGOTIATOR, so nodes will allow matching.
- BZ#831709
- Installing only the SharedPort feature on a node used to cause the master daemon to stop. A dependency on Master from the SharedPort Feature has been added so it is no longer possible to install only SharedPort on a node and prevent the master daemon from running.
- BZ#746005
- Because MRG Grid supports the Plumage feature, the Remote Configuration database now has features to enable Plumage functionality.
- BZ#802823
- Not all preconfigured features in the remote configuration default database have had names that adequately explain their function. All features now have an annotation with additional information.
- BZ#767272
- Upgrading wallaby used to change the ownership of database patch files from root to wallaby, which caused a verification problem. To prevent this, patch files are now always owned by wallaby, so upgrading wallaby no longer causes verification issues.
- BZ#803359
- The UNHIBERNATE parameter in the remote configuration base database has been too loosely defined, causing hibernated machines to wake up unnecessarily. The definition of UNHIBERNATE has now been tightened so not all hibernated machines will be woken up.
- BZ#831725
- The ALLOW_NEGOTIATOR_* values that were set on features in the Remote Configuration default database did not previously contain IP_ADDRESS. Some pools configured with Remote Configuration would then need additional changes in order to function. ALLOW_NEGOTIATOR_* param values now contain IP_ADDRESS, which has negated the need for additional changes.
condor-wallaby-client
- BZ#851222
- When multiple wallaby-agents were running against the broker that the configd is talking to, the configd would always choose the first one, which was not necessarily preferred. To address this, the configd now exits with a notification in its log file if it finds more than one wallaby-agent running on the broker.
- BZ#815653
- The configd previously used command line arguments instead of configuration parameters for broker user/password information. Now, the configd can use the same parameters as other qmf-related daemons (MF_BROKER_USER and QMF_BROKER_PASSWORD_FILE) to specify username and password when connecting to a broker.
- BZ#815820
- The configd previously used the QMF_BROKER_AUTH_MECHANISM parameter to specify authentication mechanisms when connecting to a broker, whereas other qmf-related daemons used QMF_BROKER_AUTH_MECH. Like other qmf-related daemons, the configd can now use both parameters, with QMF_BROKER_AUTH_MECHANISM taking precedence.
condor-wallaby-tools
- BZ#786020
- When using the remote configuration tools to remove a configuration feature that shared a parameter requiring user input with another feature, the shared parameter would be removed instead of being kept in the configuration. Logic in the tools has been corrected to detect if a parameter is needed by more than one feature, so the parameter will no longer be deleted along with the feature.
- BZ#749569
- Previously, a new special group called skeleton group was added by the wallaby service. Remote configuration tools such as condor_configure_pool and condor_configure_store were not able to configure the skeleton group. Additional support for the skeleton group have now been added to the remote configuration tools to rectify the issue.
cumin
- BZ#886921
- When a purge was attempted on queued messages, the purge failed with the message "Purge: Fauled (Incorrect number of arguments:expected 2, got 1). This was caused by a recent change in the queue.purge method which now required a "filter" argument. The fix now passes an empty dictionary as the new filter argument for the purge method. The purge method should now complete successfully.
- BZ#814386
- Cumin previously used QMF methods for remote grid operations by default. This feature enhancement integrates Aviary web servers with Cumin, allowing support for job control, submission and job/submission queries. Also included is support for the discovery of Aviary endpoints through the Aviary locator service. Aviary web services is a default feature on installation. Cumin will only use QMF methods of Aviary is turned off.
- BZ#812407
- Cumin based reported statistics in the Overview page on Submitter objects which reported statistics on accounting groups. This did not necessarily a direct map of the correct sums across many Submission groups owned by a cumin user ID. This resulted in possibly incorrect values for a particular user. The Grid User Overview has now been changed to use sums from across Submission records where the owner matches the cumin used ID. All reported statistics for running, idle and held jobs should now match the information available on the Submissions tab.
- BZ#789351
- In order to view fully functional charts with cumin, the Flash Player functionality was needed. Instead of Flash Plager, a new Javascript-based charting solution that requires no special browser plugin has been implemented.
- BZ#886924
- The Exchange overview page does not display the charts for "Messages received, routed and dropped." An exception occured in the rate chart code that affected all rate charts, causing the issue. The exception has been fixed to pass the argument correctly and all dropped charts should now display correctly on the Exchange overview page.
- BZ#846010
- Cumin development can become difficult to coordinate between grid and messaging due to UI issues, graphing technologies, etc. To make it easier for grid and messaging to develop cumin for both, cumin-messaging has been created as it's own package. This allows users to choose between grid or messaging based on the package they have installed.
- BZ#886937
- Previously, attempting to remove a broker link in the Messaging tab resulted in failed errors. The link remove error was caused by a nonstandard object_id that was needed to remove that link. The code to fetch the qmf link object that constructed the link name has been corrected. Clicking "Remove broker link" should now correctly remove the link.
- BZ#733516
- Changes in Cumin have been implemented to obtain Aviary endpoints from either QMF or a new remote service automatically. This service allows Cumin to discover all of the Aviary endpoints in the pool by querying a service at a single well-known URL. Use of the locator service is turned off by default but maybe enabled in /etc/cumin/cumin.conf. Enabling the locator service relieves the maintenance burden in deployments with multiple Aviary services. For more information, consult the Management Console Installation Guide.
- BZ#799129
- This enhancement adds Kerberos authentication in Cumin. This allows a kerberos authentication server to handle all cumin authentication. To enable, add "kerb" to the "auth" config in /etc/cumin/cumin.conf and set the kerberos_realm to the value required by the Kerberos Authentication server's setup in krb5.conf. Cumin will use the python-kerberos library to authenticate users. By default, all kerberos-authenticated users will be treated as non-admins in Cumin. In order to get a kerberos-authenticated admin user, add an "external" user to the cumin datbase and then add the admin role assignment via the cumin-admin utility.
- BZ#807838
- cumin-reporting has been introduced as a technical preview. cumin-reporting has been added to allow Cumin to use data from the condor-plumage (ODS) database to generate long duration visualizations of grid system behavior. This allows users to visualize system usage over longer periods of time than the 1 day maximum window that Cumin previously provided.
- BZ#703859
- A new chart has been added to the Grid Overview page that shows resource utilization by accounting groups, identified by the name "Pool Usage by Accounting Group". These charts show job submission//completion statistics and other features. Note that the data in this chart relies on data from the plumage subsystem in condor and will only yield data if the cumin machine has the pymongo package installed (currently only available in el6).
- BZ#886942
- The Access Control page displayed an incorrect menu, causing the page header to display twice. BrokerAccess Control is derived from ModeSet where classes have summary sections that contain task links. At the same time, Broker AccessControl is also a BrokerFrame modeset. This causes two summary sections to be displayed. To fix this, the ObjectView has been subclassed for Access Control and the header has been suppressed. The Access Control page now shows only one page header.
- BZ#881826
- The Queue overview page does not display the charts for "Messages enqueued and dequeued." An exception occured in the rate chart code that affected all rate charts, causing the issue. The exception has been fixed to pass the argument correctly and all dropped charts should now display correctly on the Queue overview page.
- BZ#887167
- The Connections overview page does not display the charts for "Bytes Sent and Received". An exception occured in the rate chart code that affected all rate charts, causing the issue. The exception has been fixed to pass the argument correctly and all dropped charts should now display correctly on the Connections overview page.
- BZ#823506
- After installing cumin on Red Hat Enterprise Linux 6.3, cumin does not come up after a reboot. In some cases where the postgres database does not fully start, cumin is affected. cumin retries connection to the database if psql returns the error code "psql:FATAL: the database system is starting up". cumin has now been fixed to check for this error message. This error message will signal a restart of the cumin service. This should allow cumin to startup after a reboot.
- BZ#752732
- The list of Operating Systems exceeded the boundaries of the Overview page's frame. This is due to the cascading style sheet used on the page content's div box. The float property in the cascading style sheet has been cleared and the page layout of the Overview page should now accommodate the list of Operating Systems regardless of character length.
- BZ#782359
- Certain QMF method calls returned errors in the log files under normal cumin operation. While these errors are harmless and all method calls have been handled as expected, these may cause users concern when the errors appear on the log file. This fix changes the logging levels on this type of error to "DEBUG" so that the errors will only be seen in development context. These errors will not be visible in a normal user deployment.
- BZ#783139
- When cumin rendered a page where an object is found to be missing, the user is shown the error message "We can't find the object you requested." but is never taken back to the previous page. This requires the user to manually return to the initial site to try again. This is caused by inefficient handling of deleted submissions and errors in redirection. This fix improves the handling of deleted submissions, errors in the job pages now redirect to job lists or submission lists as appropriate. When errors occur, they now redirect users to the pages that will allow them alternative courses of action instead of a static error page.
- BZ#851205
- On the Scheduler page, the table and it's contents overran the boundary line of the page, resulting in misaligned formatting. No user functionality was impacted. This was caused by the CSS property table-layout which was set to "auto". To fix the formatting, the table-layout property has been set to "fixed" and column tags have been defined to restrict column widths. The table on the Scheduler page should now never run outside the boundary line.
- BZ#799382
- When exporting the quotes to CSV format, the cumin page would occasionally hang at the "loading" status instead of showing the actual values. This was caused by the CSV export prematurely triggering the export before all the quote values have been loaded. Cumin code has been fixed so that in CSV mode, cumin will trigger the rendering only when all the quota values are filled in. The exported CSV files should now show the correct values.
- BZ#756384
- Suspend and Continue buttons were added alongside the Hold, Release and Remove buttons on the QMF and Aviary interfaces in condor. This integrates suspend and continue into cumin. Suspend and Continue task links were added to the list of job control tasks on a job details page. Additionally, the Suspended job count statistic for submissions was added as a column on submissions lists. The "Enqueued" column values were abbreviated to make room, with the full value now displayed when hovering with the mouse.
- BZ#887174
- When viewing the session for a selected connection on the Connections page, all sessions are displayed for all connections instead of just one specific connection. A filter has been added to display only the sessions for the selected connection.
- BZ#799404
- When users exported a limits table to CSV format, the exported file included visible HTML markup, because the class that renders limits did not include details about handling export to CSV. Special handling has been added to the class to remove HTML tags during CSV export. Limits now display without markup in the exported file, as intended.
- BZ#801047
- Previously, the default behaviour of the
sasl-mech-listparameter allowed Cumin to use all available SASL mechanisms to authenticate against a broker. Users were advised to disallow anonymous authentication manually. Since disallowing anonymous authentication is recommended in most use cases, the default behaviour of thesasl-mech-listhas changed.The new default behaviour forsasl-mech-listis as follows.- For broker addresses that specify credentials (username and password) in the URL,
sasl-mech-listdefaults to the list of recommended password authentication mechanisms for Cumin (PLAINandDIGEST-MD5). - For broker addresses that do not contain credentials,
sasl-mech-listdefaults toANONYMOUS.
The previous default behaviour can be acheived by settingsasl-mech-listtoAVAILABLE. - BZ#802704
- All wallaby nodes were always shown in the Inventory page, regardless of the filter value. This occurred because the filter was not applied to nodes from the wallaby subsystem. Filters now apply to nodes from the wallaby subsystem, and the Inventory page displays nodes according to the filter set.
- BZ#850759
- Previously, if a user attempted to submit a form after their session expired or the Cumin server was restarted, the form was deemed invalid, and the user's browser displayed a page with an error message similar to the following:
APPLICATION ERROR Traceback (most recent call last): File "/usr/share/cumin/python/wooly/server.py", line 145, in service_page_request session.check_csrf() File "/usr/share/cumin/python/wooly/__init__.py", line 747, in check_csrf raise CSRFException("Possible CSRF attempt") CSRFException: Possible CSRF attemptCumin has been updated so that users are redirected to the Cumin main page when an expired form is submitted, and prompted to log back in to Cumin when necessary. - BZ#635207
- When necessary, Cumin now includes links to sub-shares in the group quota editing utility. This allows users to edit sub-shares more easily. Links to sub-shares are added below the sliders and pie chart in the group quota editing utility. Click on a sub-share link to navigate to the Edit Share form. When editing is complete, the browser returns to the quota table.
- BZ#796798
- Previously, Cumin installations used the persona named
defaultas the default persona. This included both Messaging and Grid user interfaces. Since Cumin is primarily used in Grid deployments, Cumin installations now use the persona namedgridas the default persona. Users can configure the personas namedmessaginganddefaultby editing thecumin.conffile. - BZ#871453
- The Cumin web server restarted when Cumin failed to handle some errors generated by SSL socket connections. The error handler has been updated to handle these additional errors, and errors generated by SSL sockets no longer cause the web server to restart. Instead, the failed connection is dropped and Cumin continues to operate as expected.
- BZ#760567
- Quota editing sometimes failed when no group information was present. This resulted in a stack trace being displayed in the browser. Moving to charting that did not require Adobe Flash Player involved adding a mechanism to log errors when this condition occurs, without showing a stack trace to the user.
- BZ#853454
- The
StartdIpAddrjob attribute could previously be edited because of a typographical error in Cumin code. Modifying this parameter has no known negative effects, but negative impacts cannot be ruled out entirely. The code has been modified so that theStartdIpAddrattribute is treated as read-only, as intended. - BZ#873335
- When page update requests were made by a client without valid credentials, the web server did not redirect users correctly. After being prompted to log in, users saw an XML document instead of returning to a valid Cumin page. Cumin now issues a reload directive to the browser when an update request is received from a client without valid credentials.
- BZ#848344
- The Cumin Aviary Technology Preview requires command arguments for jobs submitted in Cumin using Aviary. Cumin therefore rejects submissions that specify commands without arguments. To work around this issue, expand the submission form by clicking on the button, and add the string "
Args =" to the Extra attributes field. This satisfies the restriction in Cumin Aviary without defining attributes. - BZ#846349
- During database installation, the PostgreSQL user
cuminwas created as a database superuser. This granted thecuminuser unnecessary privileges in the PostgreSQL database. Thecuminuser remains the owner of thecumindatabase, but is no longer granted superuser privileges during installation. Instead, thecuminuser is granted only the permissions required for Cumin to operate correctly. - BZ#768298
- Previously, there was no clear documentation about supported browsers for Cumin. This caused problems for users who were unintentionally using unsupported browsers. A list of supported browsers has been added to the Cumin About page to help alleviate this issue.
- BZ#801287
- Previously, Cumin did not make use of a
pidfile, which made it difficult to determine the current status of thecuminservice. Apidfile for Cumin is now created at/var/run/cumin.pidwhen the service is started, and deleted when the service is stopped byinitd. The existence of/var/run/cumin.pidwhen thecuminservice is not running is considered evidence of a crash. - BZ#809006
- Strings that contained special XML characters were being escaped more than once, which resulted in text containing special XML characters being displayed incorrectly. The
xml_escaperoutine has been modified to prevent double-escaping, so strings that contain special XML characters should now display correctly in the browser. - BZ#787138
- Notifications displayed by Cumin persisted until dismissed, but gave no indication of when they originally appeared. This reduced the usefulness of the notifications. Each notification message is now displayed alongside a timestamp marking the time of the initial notification display, to make it easier to determine when problems began to occur.
- BZ#805029
- Slotvis functionality was not optimal, did not scale well, and required Adobe Flash Player. Slotvis has therefore been removed from Cumin, which also removes the requirement for Adobe Flash Player.
- BZ#800065
- The
cuminservice script did not accurately detect the termination of the Cumin master process. The service assumed termination occurred immediately after it issued a SIGTERM to the master process, and removed thepidfile. However, since Cumin child processes could take time to shut down gracefully, this was not always true. This meant that, if thestartandstopcommands, or therestartcommand, were issued while the process was still running, theinit.dscript erroneously returnedOK.Thecuminservice script now checks the status of the Cumin master process after a SIGTERM has been issued. If the process does not shut down within a given period (5 seconds), the service script notes that the process may not have shut down correctly, and does not remove thepidfile. Existence of apidfile without a running service therefore indicates a service crash or a prolonged shutdown. Check/var/log/cumin/master.logfor shutdown messages to verify. - BZ#750196
- Cumin notification banners needed to be dismissed by a user; this meant that they could take up a large amount of screen space if left undismissed. This update adds the
notification-timeoutparameter, which specifies a number of seconds after which a notification banner is automatically dismissed. The default value fornotification-timeoutis180seconds. Set it to0to enable the previous behaviour, where banners are never dismissed without user interaction. - BZ#891919
- The supported version of Firefox has been updated to Firefox 17. The Cumin About page has been updated appropriately.
- BZ#907866
- The Technology Preview of Cumin Aviary forced the
Requirementsfield to default totruefor all jobs submitted with Aviary. This meant that any user-set value for theRequirementsfield was overwritten, and therefore ignored. The construction of arguments during job submission has been corrected, andRequirementsvalues set by the user are now passed correctly to condor when a job is submitted using Aviary.
wallaby
- BZ#782816
- If no authentication mechanism was explicitly specified for the wallaby shell, a spurious warning about an invalid broker mechanism was displayed. Wallaby has been updated so that this warning is displayed only if the explicitly-specified mechanism is invalid.
- BZ#796406
- Previously, if a group was deleted, and no other changes were made to nodes in the deleted group before they were activated elsewhere, nodes in the deleted group may not have received configuration updates. This resulted in incorrect node configuration when the nodes were later activated. Wallaby now adds members of a deleted group to an internal list of changed nodes, ensuring that their configurations are recalculated the net time the nodes are activated.
- BZ#786801
- Previous versions of the wallaby package included a logrotate recipe that incorrectly named archived logs. Rather than numbering archived logs (for example,
agent_log.1,agent_log.2and so on), the archived logs were namedagent_log.1,agent_log.1.1,agent_log.1.1.1, and so on. This updated wallaby package corrects the logrotate recipe so that archive files are named as expected. - BZ#802821
- Previously, wallaby did not expose free-form textual descriptions or comments for features and snapshots. This made it difficult to document and to interpret snapshots. The wallaby service now allows users to annotate configuration entities and configuration snapshots with free-form textual descriptions. To annotate an entity, use the
annotationproperty and thesetAnnotationmethod of each entity type. To annotate snapshots, use thewallaby make-snapshotmethod, or call theStore#makeSnapshotWithOptionsmethod with anannotationoption set in the option map. - BZ#850205
- Previous versions of the wallaby shell failed to catch certain errors thrown by the parameter handling library. This resulted in incomprehensible error messages being displayed when incorrect command-line arguments were used. The wallaby now catches these errors and displays a user-friendly error message.
- BZ#807820
- Previous versions of the wallaby package did not use the assigned UID and GID for the wallaby user. This meant that moving files between machines with wallaby installed was unnecessarily complicated. wallaby now uses its assigned UID and GID when creating a wallaby user if these are available in the operating system, allowing wallaby to retain the same UID and GID across systems wherever possible.
wallaby-utils
- BZ#802799
- Wallaby shell commands that began with
replace, but did not provide a list of entity names, failed with an error. This made it difficult to replace the entity set with an empty set. Replace commands now operate without requiring a list of entity names, which allows entity sets to be replaced with empty sets. - BZ#851217
- In some cases more than one wallaby agent were started on the same broker. This was not intended, and could lead to unpredictable results if wallaby shell commands were run while multiple wallaby agents existed on the broker. The wallaby shell now ensures that only one wallaby agent is running on the specified broker as part of its startup, preventing the aforementioned unpredictable behavior.
- BZ#881366
- Previously, trying to empty a metadata field with any wallaby shell modify command resulted in an error. This meant it was not possible to set metadata fields to empty strings. Modify commands now accept empty values.
- BZ#820419
- Previously, wallaby lacked the ability to display node configuration details that was available in the condor utilities. The shell has been updated with a command that allows users to inspect the node configuration as it will be received by the condor-wallaby-client.
- BZ#801632
- This update adds the ability to remove snapshots using the wallaby shell.
- BZ#864091
- Previously, the wallaby
list-userscommand printed users with a role ofREADas if they had a role ofREAD_ONLY. This was not consistent with wallaby user management commands. User roles displayed withlist-usersare now labelled correctly and consistently with the user management commands.
6.3. RHSA-2012:1278 – Moderate: Red Hat Enterprise MRG Grid 2.2 security, bug fix, and enhancement update
Security Fixes
- CVE-2012-2680
- A number of unprotected resources (web pages, export functionality, image viewing) were found in Cumin. An unauthenticated user could bypass intended access restrictions, resulting in information disclosure.
- CVE-2012-2681
- Cumin could generate weak session keys, potentially allowing remote attackers to predict session keys and obtain unauthorized access to Cumin.
- CVE-2012-2683
- Multiple cross-site scripting flaws in Cumin could allow remote attackers to inject arbitrary web script on a web page displayed by Cumin.
- CVE-2012-2684
- An SQL injection flaw in Cumin could allow remote attackers to manipulate the contents of the back-end database via a specially-crafted URL.
- CVE-2012-2685
- When Cumin handled image requests, clients could request images of arbitrary sizes. This could result in large memory allocations on the Cumin server, leading to an out-of-memory condition.
- CVE-2012-2734
- Cumin did not protect against Cross-Site Request Forgery attacks. If an attacker could trick a user, who was logged into the Cumin web interface, into visiting a specially-crafted web page, it could lead to unauthorized command execution in the Cumin web interface with the privileges of the logged in user.
- CVE-2012-2735
- A session fixation flaw was found in Cumin. An authenticated user able to pre-set the Cumin session cookie in a victim's browser could possibly use this flaw to steal the victim's session after they log into Cumin.
- CVE-2012-2459
- It was found that authenticated users could send a specially-crafted HTTP POST request to Cumin that would cause it to submit a job attribute change to Condor. This could be used to change internal Condor attributes, including the Owner attribute, which could allow Cumin users to elevate their privileges.
- CVE-2012-3492
- It was discovered that Condor's file system authentication challenge accepted directories with weak permissions (for example, world readable, writable and executable permissions). If a user created a directory with such permissions, a local attacker could rename it, allowing them to execute jobs with the privileges of the victim user.
- CVE-2012-3493
- It was discovered that Condor exposed private information in the data in the ClassAds format served by condor_startd. An unauthenticated user able to connect to condor_startd's port could request a ClassAd for a running job, provided they could guess or brute-force the PID of the job. This could expose the ClaimId which, if obtained, could be used to control the job as well as start new jobs on the system.
- CVE-2012-3491
- It was discovered that the ability to abort a job in Condor only required WRITE authorization, instead of a combination of WRITE authorization and job ownership. This could allow an authenticated attacker to bypass intended restrictions and abort any idle job on the system.
Bug Fixes
- BZ#784434
- Frequent reconfiguration requests sent to the Negotiator daemon during a negotiation cycle could block the negotiation cycle and cause it to fail. With this update, reconfiguration requests received during a negotiation cycle are delayed until after the negotiation cycle has completed. Additionally, Negotiator data structures that are not reentrant with the
condor_reconfigcommand remain protected. - BZ#784968
- Previously, regular updates of concurrency limits required frequent reconfiguration of the Negotiator daemon, which reduced performance of an MRG Grid pool. This update adds a new parameter,
NEGOTIATOR_READ_CONFIG_BEFORE_CYCLE, which forces Negotiator to re-read its configuration before the beginning of the negotiation cycle without a need to invoke a full reconfiguration. Concurrency limits can now be updated regularly without any negative impact on pool performance. - BZ#788617
- Previously, the
condor_q -analyzecommand could not be used to determine which exceeded concurrency limit prevented a job from being successfully negotiated. This update modifies the Negotiator daemon to provide the name of concurrency limit that causes the match failure. Thecondor_q -analyzecommand can now be used to determine the specific concurrency limit which causes a job match failure. - BZ#794660
- Under certain circumstances, a partitionable slot could split into more dynamic slots than a machine could support. As a consequence, the scheduler could assign the machine jobs that would consume more CPUs and memory than was available. This update adds a new logic that corrects the slot allocation, and a machine can now have assigned only as many jobs as it is able to support.
- BZ#799838
- Recent changes to make sure that history index files were correctly cleaned up led to a situation where the internal job and submission collections were aggressively purged of active jobs. Consequently, the active jobs submitted through the Aviary or QMF interfaces were not reported by the interface. This update corrects this problem and jobs are now visible as expected during the whole job cycle.
- BZ#807738
- Previously, the
DAEMON_LISTparameter was marked asneeds_restartin the remote configuration database. Therefore, if the value ofDAEMON_LISTchanged, all daemons of MRG Grid component had to be restarted on a cluster node for the parameter change to take effect, which caused service disruption. With this update, the remote configuration database has been modified and theneeds_restartflag has been removed from theDAEMON_LISTparameter. Changes ofDAEMON_LISTno longer result in the restart of all MRG Grid daemons on a node. - BZ#810519
- When attempting to remove a job on hold, which had invalid parameters and was never run, the job returned back to the
holdstate and thecondor_rmcommand failed. This update corrects the state machine so that thecondor_rmcommand now correctly removes a job that has been placed on hold due to invalid input parameters. - BZ#812126
- The initial remote configuration database was recently modified to allow plug-in configuration outside of the remote configuration service. However, to simplify configuration, some packages store their configuration files in the directory pointed to by the
LOCAL_CONFIG_DIRvariable if the remote configuration was not used. This could cause some daemons to unexpectedly load plug-ins that were not properly configured. This update modifies the initial remote configuration database so that*.PLUGINSvariables can no longer be defined outside of the remote configuration service. All plug-ins now must be configured using the remote configuration. - BZ#832968
- Previously, code used for cleaning up the history index could cause a situation where internal job and submission collections were aggressively purged of active jobs. Consequently, the active jobs and their submissions were not reported by the Aviary or QMF query interface. This bug has been fixed and jobs and submissions for active jobs are now displayed as expected even while history indexes are being cleaned.
- BZ#748507
- When the default group configuration used string append operators, Wallaby sometimes generated redundant node configurations. When such configurations were deployed to MRG Grid nodes, the
condor_masterdaemon could fail to start. This bug has been fixed, Wallaby no longer generates such configurations, and can also handle archived redundant configurations. - BZ#801543
- It is possible to have the
shadow_recentry set for a claim while no shadow process is present. During this time, the PID field ofshadow_recis 0. In rare scenarios, theschedddaemon tried to kill PID 0 and then terminated with the following error message:Send_Signal: sent unsafe pid (0)
This update fixes theSend_Signal()function to not terminate in the described scenario andscheddnow cleans up gracefully if there is ashadow_recentry for a claim but no shadow process. - BZ#809799
- Due to a regression, when a job with both Pre and Post scripts set was submitted to the DAGMan utility, DAGMan terminated unexpectedly. With this update, the Post script is now always processed even if a DAGMan job fails.
- BZ#831235
- Previously, Cumin did not validate redirection URLs specified in POST requests, creating a vulnerability in the Open HTTP Redirector.This bug has been fixed and Cumin now validates all redirection URLs. Only redirection to Cumin's own pages is allowed. If an invalid redirection is discovered, a redirection to the Cumin main page for the user is included instead.
- BZ#831244
- Prior to this update, Cumin allowed unrestricted CR and LF characters in response headers, creating a response header splitting vulnerability. With this update, Cumin scans all response headers for LF characters. Now, Cumin never writes LF characters into any legitimate response header so there is no danger of the scan interfering with legitimate responses. If an illegitimate response header is seen, a simple exception trace page is sent as the response.
- BZ#840106
- Previously, Cumin used the
random.getrandbits()function to generate session keys. According to the Python documentation, this function uses the Mersenne Twister algorithm, which can leak state. It was theoretically possible to predict Cumin session keys after gathering enough leaked state information. With this update, Cumin reads from the/dev/urandomdevice to create session keys instead, thus preventing this problem. - BZ#840110
- Cumin did not check for a valid session before processing requests for certain auxiliary web pages. Consequently, it was possible to retrieve certain types of data from Cumin or cause activity on the server without a valid login. With this update, all critical pages now check for a valid session before processing and no unauthorized access to important Cumin pages is allowed.
- BZ#840112
- In some instances, Cumin did not correctly quote SQL expressions, creating an SQL injection vulnerability. Now, facilities provided in the
pysopg2module are used to generate properly quoted expressions, thus preventing this problem. - BZ#840118
- When requesting chart sizes in the PNG format, extra large graphics files were generated in some cases. Consequently, a denial of service (DoS) attack was possible abusing this feature, as affected Cumin instances and their hosts could eventually become unresponsive or terminate unexpectedly. With this update, if an image larger than two million pixels in size is requested, Cumin creates a log message and returns an image of the default size (360x100 pixels), thus preventing this problem.
- BZ#840121
- Previously, Cumin had no method to check the validity of POST requests, creating a Cross-site request forgery (CSRF) vulnerability. With thi Cumin injects a randomly-generated unique ID into every rendered form and does not process any POST requests that do not contain the correct ID, thus preventing this problem.
- BZ#840123
- Previously, Cumin did not reset the session cookie after authentication, creating a session fixation vulnerability. This bug has been fixed, Cumin now always resets the session cookie when the user logs in or logs out, and no session ID is ever preserved outside of a valid login session.
- BZ#840133
- Cumin did not correctly escape text in error trace displays. Consequently, XML special characters or JavaScript code within the error trace output could cause errors on the page. With this update, all text on the error page is properly escaped during rendering and the error pages are now correctly displayed by browsers, regardless of values that may be present in the stack trace.
- BZ#807970
- Cumin did not escape single (') and double (") quotation mark characters in strings contained in the page source. Consequently, erroneous operation with strings containing quotation marks could occur. This bug has been fixed and all quotation marks in page source are now properly escaped.
- BZ#835595
- Previously, the setuid rval variable was not checked, potentially creating a privilege escalation vulnerability. With this update, the setuid rval variable is properly checked and code is no longer vulnerable to privilege escalation via process exhaustion.
- BZ#836294
- Previously, it was possible for an attacker with a custom CEDAR client to authenticate as a user with a currently running job, creating a security vulnerability. With this update, authentication code has been improved to remove this vulnerability.
- BZ#836253
- Previously, it was possible for an attacker with a custom CEDAR client to have an unprivileged user to stop running jobs. This update removes unused code that allowed this unauthorized job control, thus preventing this problem.
- BZ#836590
- The security audit discovered the
log_except()copied an error message of unknown length into the statically sizedevent.messagebuffer using thesprintf()function, potentially creating a stack squash vulnerability. This update hardens the internal interface, thus preventing this problem. - BZ#837623
- Previously, it was possible for an attacker with a custom CEDAR client to have an unprivileged user to control active claims. This update removes unused code that allowed this scenario, thus preventing the problem.
- BZ#837890
- The security audit discovered a problem in the
LookupString()function, which could potentially lead to a stack smashing vulnerability. This update hardens the internal interface, thus preventing this problem. - BZ#837037
- Cumin relied on screen presentation to control which job attributes were writable. Consequently, manually constructed POST requests could allow a user to edit protected attributes by circumventing the screen controls. With this update, Cumin uses information from Condor to validate job attribute changes. It is no longer possible to modify attributes that Condor marks as read-only, even through manually constructed POST requests.
- BZ#852321
- When using Remote Configuration to configure a High Availablity Scheduler with the
HASchedulerfeature, theshadowdaemon could fail to start with the following error message:ERROR "According to /var/lib/condor/spool/spool_version, the SPOOL directory is written in spool version 0, but I only support versions back to 1"
This update adds parameters to point theshadowdaemon at the same spool location used by theHASchedulerfeature. Now,shadowruns jobs as usual using the spool pointed to by the node with theHASchedulerfeature installed, thus fixing this bug. - BZ#848212, BZ#835592, BZ#841173, BZ#843476
- This update also provides defense in depth patches for Condor.
Enhancements
- BZ#769573
- Cumin had no mechanism for distinguishing between administrative and general users. As a consequence, the different user roles were not respected and all users could access all displays and use all Cumin functions. With this update, the role enforcement mechanism has been fully implemented but it is disabled by default for backwards compatibility. It can be enabled in the
/etc/cumin/cumin.confconfiguration file. When the role enforcement is enabled, general users can see only the displays under the Grid User tab, and are able to see and manage only their own jobs. By default, all users are assigned to the general user role unless thecumin-admincommand is used to grant administrative privileges to a user. - BZ#591521
- The
startddaemon did not previously support managing of local machine resources above the standard CPU, memory, disk, and swap resources. With this update,startdhas been enhanced to allow additional local machine resources to be specified in the cluster configuration. These resources are managed bystartdand the slot resource accounting mechanism, and can be requested by job submissions. - BZ#721110
- The Negotiator daemon previously did not enable to alter concurrency limits for jobs without a need of Negotiator's reconfiguration. This could have a significant impact on Condor pool performance if the daemon reconfiguration was invoked on a frequent basis. This update enhances Negotiator to support named groups for scoping multiple default concurrency limits based on a limit name prefix. Concurrency limits can now be defined with multiple possible default values without invoking frequent Negotiator reconfigurations.
- BZ#737979
- Previously, Cumin user authentication was limited only to the authentication against the Cumin database. Therefore, sites could not use their existing user LDAP accounts and the accounts had to be recreated in the Cumin database in order to access the Cumin management console. With this update, Cumin has been extended to allow the use of LDAP servers for authentication. If a user is not found in the Cumin database, Cumin attempts to authenticate a user against a specified list of LDAP directories.
- BZ#751870
- A new Condor Resource Agent has been implemented that allows multiple HA Schedd scheduler daemons, managed by Red Hat High Availability, to run on the same node.
- BZ#784051
- Previously, the ClassAd log-file system did not report the location of parse errors and it was difficult to locate corruption in the job queue log. With this update, the ClassAd log-file system has been enhanced to provide the number of the record where a parse error was detected. Also, the logic for parse error identification has been improved. Parse errors can now be easily located in ClassAd transaction log files, such as the job queue log.
- BZ#785140
- Previous implementation of CPU accounting, available with the
condor_statuscommand, did not handle configurations with multiple partitionable slots on a single machine. Therefore, when attempting to determine CPU utilization for given ClassAd data on such a configuration, theTotalCpusattribute provided inaccurate metrics. This update modifies the underlying code and adds new attributes,TotalSlotCpus,TotalSlotMemory, andTotalSlotDiskto the ClassAd slot. The CPU accounting now works correctly on configurations with multiple partitionable slots on a single machine and users can now run simple queries to determine partitionable slot utilization. - BZ#785145
- Previously, Red Hat High Availability could only manage a single running HA scheduler instance on a node. This update introduces a new set of tools that allow Red Hat High Availability to manage multiple HA scheduler instances on a node.
- BZ#803895
- This update adds Deltacloud support to the Condor's Grid Universe so that users can now submit their jobs to Condor, which run against the Deltacloud API.
- BZ#806071
- This update adds new features and parameters to the remote configuration database so that the database now supports multiple job/query server configurations on a single node managed by Red Hat High Availability.
- BZ#820681
- Previously, the Cumin web server did not support secure communication over SSL and data from a web browser to the Cumin web server was therefore sent in plain text. This update modifies the code so that the Cumin web server can now be configured to use SSL in the
/etc/cumin/cumin.conffile. See Management Console Installation Guide for more information. - BZ#806079
- This update adds the
VM_NETWORKING_BRIDGE_INTERFACEparameter to the initial Wallaby database to support remote configuration of bridged networking in the virtual machine universe. In addition, theVM_SCRIPTparameter has been removed from the database and most of the virtual machine universe parameters have been marked asneeds_restartand added to thestartdsubsystem.
6.4. RHSA-2012:0100 – Moderate: MRG Grid security, bug fix and enhancement update
Security Fix
- CVE-2011-4930
- Multiple format string flaws were found in Condor. An authenticated Condor service user could use these flaws to prevent other jobs from being scheduled and executed, crash the
condor_schedddaemon, or, possibly, execute arbitrary code with the privileges of thecondoruser.
Bug Fixes
- BZ#759200
- A bug in the python-psycopg2-2.0.14 package caused a reference leak when the
cumin-datautility updated objects in the database. Specifically, the bug manifested when thecursor.mogrify(operations, params)orcursor.execute(operations, params)functions were called and theoperationsstring referenced the same value from theparamslist more than once. Consequently, long-running instances ofcumincould leak significant amounts of memory. With this update, the_mogrify()routine has been fixed and the reference leak no longer occurs whencumin-dataupdates database objects.
6.5. RHBA-2012:0045 – Red Hat Enterprise MRG Grid 2.1 bug fix and enhancement update
Bug Fixes
- BZ#569561
- The MAX_[subsys]_LOG values were read using the
atoi()function instead of a 64-bit-capable lexical casting function, which rendered it impossible to set a maximum log size equal to, or greater than, 2GB. The lexical casting functionlex_cast()has been implemented to read MAX_[subsys]_LOG values, with the result that log files of size equal to, or greater than, 2GB, can now be configured. - BZ#748738
- MF console application written in Python, after a period of time a noticeable amount of memory was lost due to a slow memory leak. After a long enough period, the console would use up all available memory and experience problems. This update explicitly calls the clean-up method of the sequence manager for each request once the request had been handled, unneeded objects are now properly released, and the memory leak no longer occurs.
6.6. RHSA-2011:1249 – Moderate: Red Hat Enterprise MRG Grid 2.0 security, bug fix and enhancement update
Security Fix
- CVE-2011-2925, Moderate
- A flaw was discovered in cumin where it would log broker authentication credentials to the cumin log file. A local user exploiting this flaw could connect to the broker outside of cumin's control and perform certain operations such as scheduling jobs, setting attributes on jobs, as well as holding, releasing or removing jobs. The user could also use this to, depending on the defined ACLs of the broker, manipulate message queues and other privileged operations.
Release Notes
- BZ#728285
- When MRG Grid run on a node with multiple network interfaces, it tried to estimate the correct interface for its communications with the remaining MRG Grid nodes. As a consequence, the node could have failed to communicate with other parts of MRG Grid correctly if a wrong interface had been chosen. As a workaround to this issue, MRG Grid can be forced to use a specific network interface by setting the
NETWORK_INTERFACEparameter to the IP address of that interface. To find out which interface has been used by MRG Grid when it failed, theD_HOSTNAMEvariable has to be included in logging configuration for a daemon that deals with issues. - N/A
- The remote configuration database requires an update to include changes for MRG Grid version 2.0. But the database snapshot provided with MRG only contains a basic configuration, and thus loading the database snapshot would replace the existing pool configuration. To solve this issue, the upgrade-wallaby-db tool which upgrades an existing deployment's database has to be used. This tool can be downloaded from the following page: https://access.redhat.com/kb/docs/DOC-58404
- BZ#688717
- With this update, the
Elastic Compute Cloud Grid ASCII Helper Protocol(EC2 GAHP) is preferred overAMAZON GAHP. The condor-ec2-enhanced-hooks package has been updated to detect the correct GAHP for the EC2 Enhanced feature based upon what GAHPs are available on the scheduler. To ensure that jobs are routed to the proper resources, the'set_gridresource = "amazon"; \'setting has been removed from all existingEC2 Enhanced routesin a MRG Grid's configuration.
Bug Fixes
- BZ#723971
- When the definition of the
subsys.parametrparameter was missing in MRG Grid's configuration, such asLL_DAEMON_LOG, aLOGparameter value would have been used instead, which caused a confusing exception to be thrown. This happened because theLOGparameter contained the name of a directory which did not correspond to the one in subsys.parameter, which contained an ordinary file name. To fix this issue, a control option, which allows to raise an exception if subsys.parameter does not exist, has been added to theread_condor_configscript. Subsystem configuration now provides useful error messages. - BZ#720507
- Due to a race condition in the
condor_schedddaemon, the daemon could have accessed stale jobClassAdswhen the matchmaking proceeded too slowly. This would have resulted in the scheduler crashing. To fix this issue, the race condition has been removed from thecondor_schedddaemon, and the daemon no longer tries to access a stale jobClassAd. - BZ#720400
- When a cluster ID was rolled over and the
SCHEDD_CLUSTER_MAXIMUM_VALUEparameter was set and the queue log rotated, a consistency verification check ensuring that the stored cluster ID was lower than the maximum cluster ID would have raised an exception causing the scheduler to restart, which resulted in a deleted job queue and failure of the scheduler. To avoid this issue, this consistency check has been disabled whenSCHEDD_CLUSTER_MAXIMUM_VALUEis set. Rolled over cluster IDs no longer cause the scheduler to fail on restart. - BZ#718104
- The
ClassAddebug()function was not implemented with the new styleClassAdcode, which restrained functionality of all features that previously used it. With this update, thedebug()function has been re-implemented, and all related tools and logs now contain debugging messages. - BZ#713511
- If a pre-script returned a failure while executing a DAG (Direct Acyclic Graph) job on an MRG Grid, then the post-script failed to run. This could have led to job inconsistencies in the MRG Grid. With this update, the post-script is now always executed regardless of the pre-script outcome.
- BZ#712973
- Due to the
condor_negotiatordaemon attempting to fill the fractional submitter limit remainder of agroup quota, theschedddaemon tried to assign every possible job cluster regardless of the fact that it was impossible to achieve. As a consequence, thenegotiatordaemon executed many unproductive negotiation attempts, which ended with rejection due to the exceeded submitter limit. To correct this issue, a new verification mechanism has been added in order to halt further negotiations for a particular group when the submitter limit is exceeded. Redundant negotiations are no longer executed, and negotiations now properly stop on the first such rejection. - BZ#712529
- When compiling the source code, the CMake compiler removed the original
condor_router_*scripts and replaced them with bogus executable files due to certain placeholders. This rendered the Condor's job routing scripts non-functional. With this update, the CMake operations has been updated, the correct scripts are now being used during the compilation process, and job routing now works as expected. - BZ#707770
- The logic for deactivating claims in the dedicated
scheduleriterated over a claim data structure while this structure was being destructively modified. Due to this issue, some of the claims were not properly released. This logic has been corrected and all claims are now properly deactivated. - BZ#707335
- When a
condor_reconfigevent occurred in the middle of the negotiation cycle, structure pointers of thecondor_negotiatordaemon that were stored in the stack could have been invalidated, which would have caused a memory read error and the daemon would have terminated unexpectedly. To fix this issue, functionality delaying thecondor_reconfigevents has been added. With this update, any current negotiation cycle now completes first and thecondor_reconfigevent is executed directly after that. Thecondor_negotiatordaemon proceeds without the read error and no longer crashes. - BZ#707081
- In Condor, accounting groups were sorted out according to the group quota usage that they were using and the number of allocated resources. During a negotiation cycle, the least-used group was considered first for claiming, the second-least was considered next, and so on. With the
Hierarchical Accounting Groupsfeature implemented, this sorting mechanism had been left out, and it could have happened that certain groups were never used. The original accounting group sorting method has been restored, and this issue no longer occurs. - BZ#707078
- Due to the
ignore_schedd_limituninitialized stack variable that is called in an inner loop within thenegotiate()function, the job submitter limits could have been ignored when expected to be followed, or the limits could have been followed when expected to be ignored. This obsolete variable has been correctly replaced by theignore_submitter_limitvariable, which now guarantees proper behavior regarding job submitter limits. - BZ#706512
- Previously, a job submitter with no idle jobs received a number of slots according to its limit regardless of a need for them. Those unused slots required additional iterations in a negotiation loop that could have been avoided if the slots were given to other job submitters. With this update, functionality that pre-filters job submitters with no idle jobs has been added, and thus these freed slots are assigned to job submitters which can use them, making the negotiation phase more effective.
- BZ#704597
- When the
condor_configddaemon attempted to retrieve a node object from the configuration store during its periodic verification, an exception was not handled properly, and communication could have been interrupted. When the thread controlling the periodic updates exited, the affected node stopped its periodic verification of the configuration store. With this update, when the exception arises, it is handled properly. All communication threads are now monitored, and they are restarted if any of them are not running. Thecondor_configperiodic updates are not interrupted any longer. - BZ#699413
- Prior to this update, the MRG Management Console (Cumin) filtered objects which were not directly associated with the
condor_collectordaemon that was discovered by Cumin, and such objects were not displayed. As a consequence, in hierarchical configurations for collector daemons, certain objects which were published by condor daemons with theCOLLECTOR_HOSTvariable set to a subordinatecollectordaemon would have not been visible in Cumin. To fix this issue, collector-based filtering has been disabled in Cumin. Cumin now processes every object that it receives from theMRG Messaging Broker, which allows all objects to be visible in the hierarchicalcollectorconfiguration. However, Cumin is only able to discover objects from a single Condor pool because deploying multiple Condor pools on the sameMessaging Broker, or configuring Cumin to use multipleMessaging Brokers, which serve different Condor pools simultaneously, could lead to unpredictable results. - BZ#719019
- Due to support for the
SlotWeight variable, functionality of thecondor_startdRANKexpression, which allowed a node to be configured to prefer certain jobs over others, was broken, andRANKwould have preempted only if the user priorities allowed it. In order to fix this issue, thecondor_negotiatorlogic has been updated to handle evaluation ofRANKproperly.RANKnow preempts as expected. - BZ#718265
- When running a low latency job which caused the
condor_starterdaemon to exit prematurely, the Condor low latency daemon did not expire the job, and the slot running this job could have not been used for other jobs until the Condor low latency daemon was restarted. The issue with a MRG Messaging message expiration has been fixed, and messages now expire as expected. - BZ#716519
- Previously, halting conditions for an outer loop of a negotiation cycle were tested at the end of the loop, which, if
pieLeft == 0.0was true, then the loop could have been executed even though there were no slots to allocate. With this update, this condition is now tested earlier in the loop and additional loop executions are now avoided. - BZ#715973
- Due to improper Boolean value-handling in the code, submitting the edit attributes page for a job resulted in multiple edit attempts equal to the number of a job's attributes. This has been fixed: Boolean values for the edit job attributes page are now handled correctly, and the number of entries in the task indicator now corresponds to the number of edited job attributes.
- BZ#715956
- The
JobAdsEditorform had theupdate_enableoption set totrue, which caused page content to refresh in accordance with the page content update interval. If the update interval was set to a low value, job attributes could have been difficult or even impossible to edit. To resolve this issue, periodic page content updates have been disabled forJobAdsEditor, and the job editor form now no longer resets. - BZ#712987
- When preparing job submissions for a negotiation, jobs were primarily sorted by submitter priorities, then secondarily by the name of the submitter. This sorting mechanism resulted in submitter job starvation because submitters always negotiated in the same order. With this update, the submitter sorting mechanism has been changed so that the submitter is primarily sorted by priority, and secondarily by starvation ratio. When negotiating for jobs, all submitters have now equal conditions.
- BZ#712975
- Previously, the
GROUP_DYNAMIC_MACH_CONSTRAINTexpression did not remove startd resource ads which matched the expression from the resource ad list. This issue resulted in resource traversal overhead and decreased performance. To fix this problem, a newNEGOTIATOR_STARTD_CONSTRAINT_REMOVEparameter has been added. In order to ensure backward compatibility, this parameter is set toFALSEby default, but if it is set toTRUE, all ads which do not match theGROUP_DYNAMIC_MACH_CONSTRAINTare removed, and thus redundant resource usage can now be avoided. - BZ#712972
- When using the round-robin scheduling algorithm, groups containing no submitter could have started a group negotiation, which was obviously inefficient. This was caused by
condor_negotiatorlogic that removed submitters with no idle jobs. To resolve this issue, thecondor_negotiatorcode has been modified to skip groups without submitters. These groups no longer spend time with group negotiation. - BZ#708944
- When a group of long-term running jobs was quickly set on hold and then released, jobs could have accidentally been removed from the job queue. The code has been modified to prevent the issue, and jobs now remain in the queue.
- BZ#707576
- When running a HTTP session with the Cumin MRG Management Console, the session was assumed to always be active. If the session timed out, Cumin had no handling mechanism for session redirection requests implemented for the background page update. So, when the session timed out, Cumin stopped refreshing the page. With this update, this exception is now properly handled and the background page update code has been modified so that users are redirected to the login page when their session times out.
- BZ#706977
- Previously, sanity verification of group quota surplus allocation examined an incorrect variable. This problem resulted in incorrect warning messages, which stated that the group quota needs to have a remaining surplus, even though allocation was correct. This sanity verification has been corrected to examine the proper variable, and such redundant warning messages no longer appear.
- BZ#703905
- Due to unnecessary verification in the
condor_negotiatorlogic, Condor rejected a submitted job in the first phase of the negotiation cycle even though submitters did not reach their limits, which resulted in additional iterations in the next negotiation phase. Redundant verification has been removed and submitters now can complete the negotiation cycle with an expected number of iterations. - BZ#703630
- The full page chart link was previously mapped so that it was difficult to associate it with the appropriate graph. To fix this issue, the full page chart link has been moved to align with the graph.
- BZ#703621
- The y-axis maximal value was previously determined according to points, which were out of reach of the display window. Consequently, only data with the low y-axis value with respect to the scale of the chart were showed in the graph. To fix this issue, the y-axis maximal value is now based only on points that are currently in the view, and the graphs are now wholly visible.
- BZ#703283
- Due to a missing SQL where-clause that would have helped to distinguish between idle, running and completed statistics of all users, and the current user, the
Job Statistictable displayed information based on the statistics for all users instead of the current user. With this update, the needed where-clause has been added in order to restrict displayed job statistics to the currently logged-in Cumin user. TheJob Statistictable now shows proper statistics for the current grid user. - BZ#703279
- When searching data in a Cumin table which contained multiple pages, applying a filter to the description field resulted in a seemingly empty result list, and a blank page was displayed if the user was not currently viewing the first page. The user's intervention was required in order to see the actual results. This issue has been fixed: when a new search filter is applied, the first page containing the list of searched results is now displayed.
- BZ#703196
- When editing attributes in Condor, submission of an argument which was not enclosed in double quote characters would have caused this argument to be interpreted as an expression, not a string. When the argument contained non-numeric characters, attempting to set the attribute would have failed even though Cumin reported success. As a workaround, arguments must be enclosed in double quotes whenever arguments contain non-numeric values. With this workaround, all such arguments are now correctly interpreted.
- BZ#702440
- Stopping
Qpid Management Framework (QMF)agents while Cumin was stopped, or when it was restarted to point to a differentMRG Messaging Brokerthan it was prior to the restart, could cause objects from missing agents to be displayed in the Cumin when it was restarted. Such objects were deleted by Cumin only when missingSesameagents were restarted using the original IDs and then removed while Cumin was running. To correct this issue, Cumin has been enhanced with a clearing mechanism which ensures that all dynamic data in Cumin is deleted at startup, andSesameagents with their associated objects are now rediscovered while Cumin is running. Cumin now displays only actual objects. - BZ#700863
- When using the
condor_configure_poolcommand, any attempt to set a feature which contained parameters that had to be modified would have failed if the--scheddoption or the--qmfbrokeroption was used in the same command. To prevent this issue, the code has been modified to update the QMF group object before any parameters to set scheduler orQMF Brokerinformation are applied. Using thecondor_configure_pooltool to set a feature parameter simultaneously with setting the scheduler orQMF Brokerinformation now works correctly. - BZ#699732
- Due to a lengthy job submission name, the submission selection table as well as the single submission display page were distorted and the submission name was displayed running to the right edge of the page. To correct this issue, a displayed job submission name on these pages is now truncated to 100 characters and the "..." string is attached to the name, indicating the truncation.
- BZ#699643
condor-reuse-slot*user accounts appeared on the welcome screen after Condor was installed on the 64-bit version of Windows Server 2008 R2 and Windows 7 operating systems. This issue has been fixed by updatingUserListregistry calls to write entries to the 64-bit registry. Slot user accounts are now hidden on all supported 64-bit versions of the Windows operating system.- BZ#682447
- When using the
MRG Management Console, the page update timestamp was updated even though the page update failed with no indication of failure, which led the user to assume that the displayed data was up-to-date. When the last update fails, the user is now informed with timestamp information. - BZ#673273
- Under certain circumstances, verifications related to the database server for the
startcommand, which are contained in the/etc/init.d/cuminscript, could have prevented thestatusandstopcommands from running. To avoid this issue, the script has been modified to run these checks only during start-up of the Cumin service. Thestatusandstopcommands now successfully complete even though the database server is stopped. - BZ#637963
- Previously in Cumin, it was possible to set mutually-exclusive static and dynamic group quotas, which caused an error to be displayed while editing the quota table. To prevent this issue, a new column on the page showing statistical limits has been added. Only dynamic quotas can now be edited and static quotas are now properly displayed in the quotas table.
- BZ#609510
- The orphaned child process of a
DAGManjob could have been left in the queue. This happened when thesubmission.dagfile was removed while theDAGManprocess was in the HOLD state. With this update, any orphanedDAGManprocesses are removed from the queue ifsubmission.dagbecomes unavailable. - BZ#723613
- Cumin detected a type mismatch when editing an integer or floating-point attribute value on the
Edit Attributespage for a job. This scenario was handled incorrectly and consequently, the Loading... message was displayed on the page indefinitely. With this update, Cumin handles these scenarios correctly and when an invalid value for a floating-point or integer job attribute is entered, the user is now warned that the edit operation failed due to a type mismatch. Operation otherwise proceeds normally. - BZ#720374
- Certain utility scripts did not end in a suffix, which is required for executables on Windows. These scripts are no longer included in the MSI installer.
- BZ#712974
- The limits-editing page did not follow the convention of hyperlinking the text of the object to be edited, causing confusion. This update ensures that the limit value is hyperlinked instead of the limit name, thus providing consistency with other editing pages.
- BZ#710215
- The
WantAWS =!= Truecondition needs to be specified in the job submission file forEC2 Enhancedjobs to ensure that a job is correctly routed to EC2. Previously, with this set up, the job was correctly routed to EC2 but it could not complete. To fix this issue, theWantAWS = Falsecondition has been set up in the ClassAd specifying each job, which is provided to an EC2 AMI. Now,EC2 Enhancedjobs that include theWantAWS =!= Truecondition in their requirements run and complete as expected. - BZ#707584
- The Inventory page contained redundant check boxes, which have been removed.
- BZ#705437
- The
schedulercould have potentially suffered a memory access error due to a missing check for an empty queue. This check has been implemented, thus eliminating the chance of incurring a memory access error. - BZ#704490
- On Windows, the system returned an incorrect Condor version. This update modifies the build spec file and the system provides the correct platform data.
- BZ#704653
- The Cumin service init script must be run as the root user. However, the script lacked a check to ensure that it was run by the root user, with the result that running the script as an unprivileged user resulted in misleading error messages. With this update, the script fails immediately (and displays [FAILED]) when it is run as an unprivileged user.
- BZ#703860
- When a column with numeric values was selected as the sort criterion for a table, Cumin sorted the rows in ascending order of the numeric values by default. With this update, Cumin sorts the tables in descending order if sorting according to a column with numeric values.
- BZ#701966
- The
condor_configure_storetool prompts the user whether the configured parameters, which are not in the store, are to be added to the store. Previously, if the user declined, the tool still prompted the user about whether these parameters were to use their default values. With this update,condor_configure_storeno longer prompts the user for default values in the described scenario, thus fixing this bug. - BZ#701337
- This update adds the
UPDATE_INTERVALparameter to the default database of the remote configuration. TheUPDATE_INTERVALparameter defines the interval in which thestartddaemon checks with thecollectorwhen using remote configuration. - BZ#700595
- In previous versions of the
wallabyservice, the wallaby package assumed the existence of acondorgroup. However, the wallaby package did not depend on the condor package, which creates this group, and the wallaby installation on a machine without acondorgroup failed. The wallaby package now creates acondorgroup if necessary and the installation succeeds on all machines. - BZ#700545
- If the user configured Condor as an
AviaryScheduler, the system could fail to activate. This happened because theAviarySchedulerdid not depend on theBaseScheduler. With this update, theAviarySchedulerdepends on theBaseSchedulerand the configuration with theAviarySchedulerfeature is activated successfully. - BZ#700540
- The value displayed for uptime was in seconds, which was not very readable or usable. This update changes the uptime value to be displayed in DD:HH:MM:SS format, which is much more readable and useful.
- BZ#697093
- The Cumin charts displayed the values exceeding one million in thousands (for example, "1000k") on the y-axis. This update adjusts the labels to display the values in millions (for example, "1M").
- BZ#697016
- In the limit editing form, there was no intuitive way of setting the limit to the
unlimitedvalue. This update adds a button, which changes the text box entry toUnlimited. Alternatively, the user can also type the valueunlimited. - BZ#696697
- The user interface allowed negative maximum allowance values to be entered when editing a limit under the
Limitstab. Although the system accepted the negative value and displayed it in the user interface, internally it treated the value as if it were 0 (zero). This update changes this behavior so that entering a negative maximum allowance value causes Cumin to display an error message, and internally the limit is not changed. - BZ#681651
- The sample configuration file provided with the ec2-enhanced-hooks package had to be renamed and edited in order to fit into the
LOCAL_CONFIG_DIRdirectory hierarchy. With this update, this file must no longer be renamed to fit into theLOCAL_CONFIG_DIRhierarchy. - BZ#681650
- Previously, installing the condor-ec2-enhanced package required manual configuration. With this update, the package installs a default configuration file, with the result that manual intervention and user configuration is no longer required.
- BZ#681648
- The condor-low-latency package now provides an example configuration file that can be modified and placed into the directory pointed to by the
LOCAL_CONFIG_DIRvariable. UsingLOCAL_CONFIG_DIRallows for greater flexibility in configuring external features, and configuration ofLow-Latencyis now easier. - BZ#674598
- Cumin's configuration file parser treats a line that begins with whitespace as a continuation of the previous line. In
cumin.conf, whitespace at the beginning of a line that was not a (valid) continuation of the previous line caused parsing errors which were not communicated to the user, and which could have caused thecuminservice to restart the application in an endless loop. With this update, any parsing errors are caught by theinitscript, any exception traces are recorded in log files under the/var/log/cumin/directory (the exact file depends on whether cumin-data or cumin-web first detected the error), and a parsing error results in the service failing upon startup (with [FAILED] printed to alert the user). - BZ#659247
- An object that was in the process of being deleted could have still been selected in Cumin, and any further operations on such an object resulted in internal errors. In turn, these unhandled exceptions could have caused unpredictable behavior, such as a web form stuck in an infinite loop. This update adds appropriate error-handling around forms which operate on selected objects, with the result that operations on deleted objects are now handled gracefully by Cumin, which informs users in the event of an error.
- BZ#631804
- Running the
condor_router_historyutility with incorrect argument values could have caused a stack dump due to uncaught argument-handling exceptions. Argument parsing now includes proper exception-handling routines so that command usage errors are now handled gracefully, and helpful error messages are printed to the console when an argument is incorrect.
Enhancements
- BZ#716466
- Prior to this update, an
Amazon Machine Image(AMI) could not have joined aVirtual Private Cloud(VPC). With this update, the necessary parameters allowing this feature have been added to theElastic Compute Cloud(EC2)Grid ASCII Helper Protocol(GAHP), and AMIs now can join VPCs. - BZ#709713
- With this update, support for the
Amazon Elastic Block Store(EBS) has been added and Condor can now associate EBS volumes with MRG/EC2 Enhanced instances. - BZ#719050
- Prior to this update, when adding multiple nodes to a group, each node had to be accessed individually because the condor_configure_store tool did not allow editing a group's membership directly. The condor_configure_store application has been modified to allow this configuration, and this process now requires editing only the respective group.
- BZ#700774
- Prior to this update, Condor jobs submitted using the Condor Aviary module included only a minimal set of attributes necessary to run jobs in most runtime environments defined by Condor. However, this set was not complete for all slot configurations. Consequently, jobs submitted by Condor Aviary did not match configuration pattern for dynamic slot provisioning and could not be scheduled unless the user explicitly added all needed attributes. Aviary submissions now implicitly include all required attributes for correct matching to dynamic slot provisioning, and such submitted jobs now run without additional user intervention.
- BZ#692911
- Previously, the
Wallabyconfiguration service allowed users to specify parameter values beginning with the>=string in order to indicate that these values were to be appended to a comma-separated list of values when the feature was applied. Beginning with version 0.10.5-4,Wallabyalso supports parameter values that begin with&&=and||=strings, which indicate lists of values separated by&&and||delimiters. This makes it possible to declare conjunctions or disjunctions of ClassAd expressions across multiple features. - BZ#652772
- Redundant security-related parameters in several features have been removed and security parameter settings have been modified to be more clear. Only the Master feature now contains security related parameters. The
SEC_DEFAULT_AUTHENTICATION_METHODSparameter on the Master feature also includes theFSandNTLMmethods. All security related parameters now havemust_changeandneeds_restartvariables set by default tofalse. - BZ#706108
- Previously, the
ALLOW_ADMINISTRATORparameter in the Master feature was set to the$(ALLOW_ADMINISTRATOR)value in the default database, rendering it fully dependent on external configuration for remotely configured nodes. In case the external configuration did not include the$(CONDOR_HOST)in theALLOW_ADMINISTRATORparameter value, an affected node could have experienced issues while running and administrating jobs in the pool. With this update, the$(CONDOR_HOST)is now a part of theALLOW_ADMINISTRATORparameter and remotely configured nodes can be now administered by the central manager(s) without any issues. - BZ#632109
- In older versions of the wallaby inventory tool, long hostnames were truncated to 25 characters so that inventory data could fit in an 80-column terminal window. The current version of the wallaby inventory tool includes the
-l (--long)option to ensure that hostnames are not truncated.
6.7. RHEA-2011:0889 – Red Hat Enterprise MRG – Grid 2.0 Release
Bug Fixes
- BZ#693782
- The grid-condor package has been upgraded to upstream version 7.6, providing a number of bug fixes and enhancements.
- BZ#553696
- Prior to this update, CD image files could not be transferred. Now, CD images are treated like other disk images and are passed via the
vm_diskutility. Images transfer and mount as expected. - BZ#563337
- Prior to this update, vacate jobs were only sent
SIGTERMsignals via condor_vacate_job when attempting to evict jobs. Due to this behavior, jobs that ignored theSIGTERMsignal were not evicted. This update adds a signal escalation timer to the job vacate code path. Now, jobs are terminated with theSIGKILLsignal if theSIGTERMsignal was ignored. - BZ#580530
- Prior to this update, mixed
-constraintand restriction lists displayed only usage information but no error messages. Due to this, users could become confused. This update corrects the problem. Now mixed-constraintsand restriction lists report an error message and display the usage information. - BZ#606391
- Prior to this update, the condor_triggerd, condor_job_server, and Qpid Management Framework (QMF) plug-ins did not connect to the AMQP broker (qpid) because the broker was configured to restrict access. This update changes the configuration parameters to allow authentication information to be set for the authorization mechanism (
QMF_BROKER_AUTH_MECH—PLAINorANONYMOUS), for the user to authenticate to the broker (QMF_BROKER_USERNAME), the location of the file that contains the broker password in clear text (QMF_BROKER_PASSWORD_FILE). Now, the above daemons and plug-ins connect to secured brokers as expected. - BZ#627957
- Prior to this update, when a user attempted to use the condor_configure_pool utility to add a feature that included another feature without the
must_changeparameter specified, the utility did not prompt the user to supply the missing value. With this update, the condor_configure_pool utility uses a new API call to better detectmust_changeparameters. Now, the condor_configure_pool prompts the user in case of missingmust_changeparameters. - BZ#631782
- Prior to this update, condor_gridmanager deleted uninitialized memory when ran as
rootor when passed the-ooption. This update adds additional checks to avoid the improper deletion. Now, the condor_gridmanager behaves as expected. - BZ#634975
- Prior to this update, Python 2.4 prevented the resolution of dangling references to QMF objects when references were received before the objects to which they referred. Due to this behavior, certain objects were not visible in the user interface if they contained a dangling reference to another object. This update corrects the programming error so that Cumin operates correctly. Now, dangling references are resolved whenever an object previously referenced is received by Cumin.
- BZ#644302
- Previously, the init script for the
cuminservice did not verify that the service is not already running. Under certain circumstances, this allowed multiple instances ofCuminto run simultaneously, and could lead to excessive memory consumption. This update adapts the/etc/rc.d/init.d/cumininit script to verify that the service is not running before starting a new instance. As a result, only one instance of thecuminservice can be started. - BZ#644313
- Prior to this update, when the
qpiddbroker suddenly disconnected, Cumin raised an exception and terminated unexpectedly with an error. This update adapts Cumin to handle such an exception. As a result, when the broker is stopped, Cumin no longer crashes and the user is presented with an informative error message. - BZ#647500
- The MRG Management Console displays slot icons grouped by the slot's state and activity. Previously, the state of the slot determined the color of the icon, and the activity determined the icon's shape. To make this distinction more explicit, these slots are now displayed in four groups: Busy, Transitioning, Owner, and Unclaimed.
- BZ#647758
- Previously, when a user supplied an incorrect hostname, the condor_reschedule, condor_vacate, condor_off, condor_on, condor_reconfig, and condor_restart scripts incorrectly returned exit code
0. With this update, the underlying source code has been adapted to address this issue. Attempts to use an incorrect hostname now cause these scripts to terminate with a non-zero exit code as expected. - BZ#647789
- When a related server (such as
condor_master,condor_schedd, orcondor_startd) stopped unexpectedly, the condor_reschedule, condor_vacate, condor_off, condor_on, condor_reconfig, and condor_restart scripts incorrectly returned exit code0. With this update, these scripts have been adapted to return a non-zero exit code when a connection fails. - BZ#669023
- When the condor_q utility was compiled without the PostgreSQL support, the output of the
condor_q -hincorrectly listed-avgqueuetimeas a valid command line option. Consequent to this, an attempt to run condor_q with this option failed with an error. This update adapts the utility not to list the-avgqueuetimeoption when it is compiled without the PostgreSQL support. - BZ#671451
- Due to recent changes to the
libvirtlibrary, the previous version of Condor was unable to detect the type of an image, and only accepted raw images. This update allows users to specify the image type by using thevm_diskoption as follows:vm_disk= file_name:device:permissions:image_typeAs a result, users can now use all image types supported by KVM. - BZ#672484
- The MRG Management Console allows a user to select multiple records to work with as a group. Prior to this update, an automatic page reload caused all previously selected records to be deselected. This no longer occurs, and when a page is automatically reloaded, the selection is preserved as expected.
- BZ#672517
- Previously, when a user used the condor_q utility to analyze a job with a requirements expression containing ClassAd function calls that were evaluated as false, the utility failed. The following error message was written to standard error:
error: bad form
With this update, the implementation of the ClassAd mechanism has been adapted to prevent this error from occurring, and users are now able to analyze such jobs as expected. - BZ#672583
- Prior to this update, the init script for the
Cuminservice sometimes incorrectly reported success even when the service failed to start or encountered an error. Additionally, any errors that occurred beforeCumin's logging mechanism was fully operational were only written to standard error. With this update, when an error occurs on service startup, the/etc/rc.d/init.d/cumininit script now reportsFAILED, and detailed information is written to thefile.$CUMIN_HOME/log/master.log - BZ#673502
- Prior to this update, the system returned an unclear error message when the user ran the
wallaby feature-importcommand without a file name parameter. Thewallaby feature-importcommand now traps the error and displays a clearer error message when run without the parameter. - BZ#673520
- Some wallaby utility subcommands caused wallaby to exit with an incorrect exit code. With this update, wallaby exits with the correct exit code that reflects the success or failure of the underlying operation.
- BZ#673538
- The condor_negotiator daemon generates statistics on every negotiation cycle. Prior to this update, only the statistics generated for the cycle defined in the
NEGOTIATOR_UPDATE_INTERVALparameter were propagated to the condor_collector daemon. As a consequence, the system sometimes presented outdated data to the user during other cycles. This update adds theNEGOTIATOR_UPDATE_AFTER_CYCLEparameter. If the parameter is set totrue, negotiator propagates the statistics to the condor_collector daemon after every cycle. - BZ#673592
- When the configuration of the condor_negotiator daemon was changed or the daemon was restarted, the entries of accounting groups and submitter names were reset to zero. With this update, the entry values are preserved in such circumstances.
- BZ#674432
- Prior to this update, the
rhubarblibrary returned a syntax error when the user created a persisting class with a name identical to an SQL reserved word. Therhubarblibrary now quotes table names in all generated SQL code, and the user can declare persisting classes with such names. - BZ#674433
- Prior to this update, the system returned an error when the user created a persisting class with no declared column. With this update, the necessary initialization steps take place even if no columns are declared and such class is created successfully.
- BZ#674630
- Previously, help pages for the condor_configure_pool and condor_configure_store tools listed the username option wrongly as
-u. This update corrects the error. Now,--helpcontains correctly-U. - BZ#675209
- Prior to this update, the
triggerddaemon was not listed as one of theDaemonCoredaemons by default. The problem has been fixed by adding theDC_DAEMON_LIST = >= TRIGGERDvalue to theTriggerServicefeature. - BZ#675703
- Due to case folding of submitted names in the
Negotiatordaemon, multiple submitter name entries were created incondor_userprio(one entirely in lower case and one with the correct mix of upper and lower case characters) if the submitter name entry contained upper case letters. Explicit case folding is now removed fromNegotiatorand data maps are updated with a case-insensitive sorting function. As a result, submitter names with upper case letters no longer appear as multiple entries and the accounting group entries now match updated entries by case and full submitted entries are case sensitive. - BZ#675725
- Prior to this update, Condor's restart action unconditionally started and ignored errors emitted by Condor's stop action. As a consequence, it was difficult to use Condor's SysV init script properly. The problem has been fixed so that the init script now skips a start attempt and issues an error if the stop action fails during the restart action.
- BZ#675935
- Prior to this update, the default value for the
CONDOR_HOSToption was set incorrectly in the default database for theHACentralManagerfeature. Due to this problem, users had to explicitly setCONDOR_HOSTwhen they enabled theHACentralManagerfeature on a node without the tools asking for a value. With this update, the tools prompt for the value forCONDOR_HOSTwhen theHACentralManagerfeature is set. Now, theHACentralManagerfeature asks to setCONDOR_HOSTwhen enabled on a node/group. - BZ#675967
- Prior to this update, the
schedddaemon incorrectly set permissions of the job files. As a result, Condor failed to transfer back output files for jobs, which were run by unknown users who were allowed to submit jobs. The problem has been resolved in this update so that theschedddaemon now sets permissions of the job files to usernobody, and Condor now correctly transfers the output files back for jobs run by unknown users. - BZ#676411
- Prior to this update, if the
condor_trigger_configcommand was executed and failed, it incorrectly returned a success return code. The problem has been fixed so that thecondor_trigger_configcommand now returns a correct failure return code in case of failure. - BZ#676902
- Prior to this update, static resources in Cumin used the
If-Modified-Sincerequest-header field but did not set a cache expiration date. As a result, static content was not transferred for resources which were cached by the browser but there was still a round trip communication to the server when a resource was loaded. This overhead was unnecessary and negatively affected Cumin's performance. The problem has been resolved by adding theCache-Control: max-ageheader to all static resources served by Cumin so that the number of pages per second that can be served by Cumin is now significantly improved. - BZ#677398
- Prior to this update, the size of Cumin's log file was unlimited. As a consequence, when a log file became too large, it could have affected performance or caused exceptions in the Cumin application. The problem has been resolved in this update so that Cumin now provides a log file rotation facility, which prevents the performance degradation and alleviates the need for manual log file rotation. The default maximum size for any log file is 10MB. The log file rotation is configurable and is covered in the Management Console Installation Guide.
- BZ#677807
- Prior to this update, the
rhubarblibrary supported serializing arbitrary Ruby objects as members of persistent objects but these fields could have been only assigned to one at a time. This unexpected and undesirable behavior for clients ofrhubarbother than Wallaby has been corrected in this update so that object-valued fields can now be updated via the assignment operator, just like other database-backed fields of persisting objects; and object-valued fields can be updated at an arbitrary number of times. - BZ#678590
- Previously, the negotiator loop did not explicitly check to determine when a group reached its quota limit. This could cause unnecessary iterations of the negotiator loop, and eventual termination at each submitter based on individual submitter limits. With this update, an explicit check has been added to the negotiator loop so that the loop will halt as soon as it detects that the group has reached its quota, and the unnecessary iterations no longer occur.
- BZ#678621
- If a
cuminapplication such ascumin-adminorcumin-webwas run as therootuser, and the$CUMIN_HOME/logfile was created, the file is locked from being written to by a regularcuminuser. Subsequently, acuminapplication terminated if it attempted to write to the file. Now, the file ownership is always changed to match the ownership of the containing directory and the bug no longer occurs. - BZ#679672
- Previously, the Max Allowance limit input for a submission did not allow a floating-point value, although such value is valid. Subsequently, the
cumindaemon rejected a limit change to such a value with the following error message:The Max Allowance field must be an integer
This bug has been fixed and both integer or floating-point values are now accepted for Max Allowance limits. - BZ#679688
- Due to non-standard labeling across the MRG Grid 2.0 user interface, the interface looked inconsistent and user-unfriendly. With this update, the Add Binding has been fixed to reflect correct case used in other labels and the user interface now looks more consistent.
- BZ#680265
- When the
cumindaemon was started from a window and that window was subsequently closed, program errors were generated ifcuminwrote output to standard error output or standard output. This bug has been fixed, any extraneous output from thecumindaemon are now directed safely to /dev/null or designated log files, and the daemon will continue to run as a service if the console window is closed. - BZ#680434
- Previously, upgrading wallaby or condor-wallaby packages using the remote configuration feature caused the
condor_configddaemon to fail to communicate with the configuration store and error messages that the store was using an unsupported API version were given. With this update, the API version check has been removed from thecondor_configddaemon and the API version mismatch bug no longer occurs. - BZ#680437
- Previously, the unsupported
/usr/sbin/condor_vm_vmware.plfile was included in the condor-vm-gahp package. With this update, the file has been removed. - BZ#681124
- During boot on some systems, the cumin database server was not fully functional by the time the
cumindaemon was started and checked the state of the database. As a consequence, the daemon sometimes reported that the database has not been created, when in fact it has. This bug has been fixed and thecumindaemon now detects that the database server process is running, if it cannot make a connection to the server. Then, it will retry the connection for up to 30 seconds before reporting an error. - BZ#688285
- Prior to this update, the EC2 Enhanced feature relied on the Amazon GAHP (Grid ASCII Helper Protocol), which has been deprecated, and replaced by the EC2 GAHP. With this update, the
set_gridresourceline has been removed from the example routes, and the translate hook now detects which GAHP to use; EC2 Enhanced works as expected. - BZ#691969
- The
condor_startddaemon could fail to remove dynamic slots that had been in theOWNERstate after a job, that was executed on those slots, was completed. With this update, thecondor_startddaemon now properly removes dynamic slots after their jobs are completed. - BZ#692635
- On Windows, an incorrectly configured
QMF_BROKER_HOSTparameter in Condor's configuration would cause thecondor_configddaemon to be unable to contact the incorrectly configured broker. Additionally,condor_configdwould not respond to shutdown commands. With this update, the underlying source code has been modified to address this issue, and thecondor_configddaemon properly shuts down, regardless of whether theQMF_BROKER_HOSTparameter is configured correctly. - BZ#692653
- The MRG Cumin web application displayed an error page when the last job for a submission was removed. With this update, this issue has been fixed, and the web application now displays the submission list page.
- BZ#692741
- Because a
procentry appeared before theclusterentry in the job queue log, there was no way to update the internalSubmissionObjectonce theATTR_OWNERflag was set (from the log) before the submission name. With this update, the internal job object updates its associatedSubmissionObjectwith an owner or name value, regardless of order. As a result, submission names and owner data appear correctly in all QMF and Aviary queries. - BZ#694835
- Providing invalid arguments, or other invalid scenarios, to the condor_configure_pool and condor_configure_tool tools returned an error even though the return value indicated a success. With this update, the aforementioned tools properly exit with a non-zero exit code for any error cases.
- BZ#695722
- Clicking through the path, → → → → , would result in a page that would correctly display the desired information, but would lack a link back to the selected scheduler frame. With this update, a
SubmissionFramewas added to theSchedulerFrame. In addition, thePoolSubmissionSelectorconstructor was changed to allow for passing of the frame name (defaulted to the original value ofmain.grid.submission) to control the target of the links generated in thePoolSubmissionSelector. As a result, the breadcrumb at the top of the screen now includes a link to the currently selected scheduler when looking at a given submission. - BZ#695800
- In the management console, an exception could occur if there was a mismatch in the group names and the dynamic group quota values. With this update, an informative message is displayed indicating which configuration entry is missing.
- BZ#699571
- Prior to this update, the internal
uptimestatistic was reset to0before being reported, causing theMonitorSelfAgeparameter to always display the value0. With this update, the internaluptimestatistic is no longer reset to0before being reported, and theMonitorSelfAgeparameter properly reflects theuptimestatistic.
Enhancements
- BZ#584562
- Prior to this update, condor_dagman calls to condor_submit could place a severe and unnecessary load on the collector while each submit looked up the shedd address for the submission. With this update, the
-schedd-daemon-ad-fileand-schedd-address-fileflags are now added to condor_submit_dag to allow targeting a directed acyclic graph (DAG) to a specific schedule daemon (Schedd) to bind all its operations toSchedd. - BZ#602766
- The
condor_triggerddaemon is now able to detect absent nodes when used in connection with the remote configuration feature. Now, thecondor_triggerddaemon raises an event for each node that is configured in wallaby for which a master qmf object is not detected if absent node detection is enabled. - BZ#610251
- Prior to this update, Condor allowed declaration of only a single view server. Due to this limit, multiplexing among multiple view servers was not possible. This update adds support of multiple view servers declared on
CONDOR_VIEW_HOST. Now, Condor is able to declare multiple view servers to allow multiplexing among these servers for improved scale. - BZ#610258
- Prior to this update, the type of Classified Advertisements (classads) to be forwarded to a
CONDOR_VIEW_HOSTcould not be selected. This update adds the parameterCONDOR_VIEW_CLASSAD_TYPES. Now, administrators can control the type of classads that are forwarded toCONDOR_VIEW_HOST. - BZ#621899
- Prior to this update, Condor EC2 jobs could not be bound to an elastic IP. Due to this behavior, a dynamic IP had to be created for each instance. This update uses the
ec2_elastic_ipparameter to support elastic IP binding for Condor EC2 jobs. - BZ#630544
- Prior to this update, the version information for Cumin could not be viewed from the web user interface. Due to this lack, users needed to log into the server host and use
rpmcommands to view the installed package information. With this update, the Cumin user interface has an About the console tab under the Your Account page where version information stored indisplays.$CUMIN_HOME/version - BZ#635197
- Prior to this update, the values in the Max Allowance column in Cumin were incorrectly displayed as integers instead of floating-point numbers. Additionally, very large values were displayed in their explicit form instead of being replaced with the Unlimited label. With this update, the Max Allowance values are correctly displayed as floating-point numbers, and values larger than 1,000,000 are now rendered as Unlimited as expected.
- BZ#642405
- The MRG Management Console has been updated to pull data displayed in the Limits, Quotas, and Job Summaries tables directly from the broker rather than pulling them from the internal database. Additionally, these tables can now be exported in the comma-separated values (CSV) file format.
- BZ#668038
- Prior to this update, when runtime reconfiguration was enabled, an authorized user could cause a daemon to terminate unexpectedly by providing a faulty configuration. This happened, because neither the
condor_config_val -set(orcondor_config_val -rset) command, nor the daemon being reconfigured would validate the input. With this update, the underlying source code has been adapted to make sure that both the condor_config_val utility and the daemons validate the configuration provided, preventing crashes during runtime. - BZ#673178
- The Red Hat Enterprise MRG console sorts data in ascending or descending order when the user clicks the relevant column header. However, the GUI (graphical user interface) did not indicate which columns were sorted and in what order. Now, an arrow placed on the selected column header indicates the sorting order and a pop-up tooltip indicates the column sorting order, when the mouse cursor hovers over the column header.
- BZ#673180
- Prior to this update, the MRG console displayed some tables that receive their data directly from the broker but were unable to search for the desired records. Tables that receive their data directly from the broker now have the ability to search for specific records.
- BZ#673183
- Some actions in the MRG console, such as displaying the job summary info and the group quotas, retrieve their data directly from the broker. This process can take a few seconds. Prior to this update, the system did not display any feedback to inform the user that the action was pending. The MRG console now displays a message informing the user that data is being loaded while waiting for broker response.
- BZ#673187
- Cumin displays data in tables that display 100 records per page. When more than 100 records are present in a table, the user could not save all the records to a file. Cumin now allows a user to save all records in a table to a comma separated value file.
- BZ#673189
- Prior to this update, the MRG Cumin console presented a list of pools under the Grid tab. Generally, only one pool is displayed under the Grid tab and a dedicated page to display a list containing one entry is thus unnecessary. The MRG Cumin console now does not display the list of pools and if more than one broker is listed in the
brokers=line of the Cumin configuration file, the first broker is used as a default. - BZ#673194
- An overview page was added to show the overall health of the grid and provide access to various grid statistics at a glance.
- BZ#674161
- Condor now supports hibernating machines that are in the idle state.
- BZ#674349
- Grid now offers a simpler web interface called Aviary, created with the use of Axis2/C and WSO2.
- BZ#674659
- Condor now includes the
PreJobPrio1,PreJobPrio2,PostJobPrio1, andPostJobPrio2job ad attributes, which allow jobs to be ordered outside theJobPrioattribute. - BZ#674669
- Prior to this update, the
LastNegotiationCycleSubmittersShareLimitN negotiator classadstatattribute did not account for a submitter reaching the share limits in agroup-quotescenario. The negotiator now includes submitter names in the attribute when any submitter reaches the submitter limit, including group quota limits. - BZ#678025
- With this update, the new statistics published in the scheduler's classads provide more detailed information and allow a better assessment of how well the scheduler is performing in the Condor pool.
- BZ#678029
- The persona feature has been added to the web chapter of the
cumindaemon configuration file. The new feature allows users to choose a grid-only or messaging-only views. The customized views serve as alternates to the default view, which incorporates both grid and messaging views. - BZ#678394
- The Power Management feature has been added to the MRG Grid 2.0 and can be configured manually via the
wallabycomponent. Users can configure Power Management through the remote configuration feature. - BZ#679553
- Previously, Condor used AWS EC2's SOAP interfaces for managing virtual instances within EC2. With this update, Condor now uses AWS EC2's Query API.
- BZ#680260
- With this update, the new
export-usersandimport-userscommands have been added to thecumin-adminutility. It is now possible to save and restore user data when a cumin database is reinitialized. It is also possible to export user data from one cumin database and import it into another cumin database. - BZ#680518
- Previously, when a reconfigure signal from MRG Grid or a SIGHUP signal from the command line was sent to the
condor_configddaemon, the daemon terminated unexpectedly. With this update, support to handle SIGHUP signals has been added to thecondor_configddaemon and the daemon now terminates properly in the described scenario. - BZ#690283
- To maintain performance as scale increases, Cumin now distributes data processing across multiple instances of cumin-data. Responsibility for data processing is partitioned at the level of QMF classes. Without this enhancement, users could notice decreases in Cumin performance as scale increases.
- BZ#694857
- With this update, a new option has been added to the condor_trigger_config utility to allow communication through a secured broker. As a result, it is now possible to configure the
triggerddaemon through a secured broker. - BZ#692169
- Certain messaging configuration or installation changes could unexpectedly cause Cumin to authenticate to a broker using the ANONYMOUS method when password authentication was intended. This was because Cumin did not have a mechanism for disallowing the use of the ANONYMOUS method. If Cumin authenticates using the ANONYMOUS method, certain features, such as job submission, will not be available from the user interface. This update adds the
sasl-mech-listconfiguration parameter to the[common]chapter in the/etc/cumin/cumin.conffile. This parameter is a space separated list of allowable SASL authentication mechanisms. Names of the available mechanisms are specified in the SASL documentation (for example, PLAIN and ANONYMOUS). By default, all available mechanisms are allowed. As a result,sasl-mech-listparameter can be configured to restrict the allowable configuration mechanisms for Cumin.
Appendix A. Revision History
| Revision History | ||||
|---|---|---|---|---|
| Revision 5-25 | Friday July 17 2015 | |||
| ||||
| Revision 5-23 | Friday March 6 2015 | |||
| ||||
| Revision 5-21 | Thursday November 27 2014 | |||
| ||||
| Revision 5-20 | Tuesday November 04 2014 | |||
| ||||
| Revision 5-19 | Monday October 13 2014 | |||
| ||||
| Revision 5-15 | Monday June 30 2014 | |||
| ||||
| Revision 5-13 | Monday April 28 2014 | |||
| ||||
| Revision 5-12 | Monday April 28 2014 | |||
| ||||