candlepin not working on Red Hat Satellite and hammer ping returns 404 on candlepin
Environment
- Red Hat Satellite 6
Issue
hammer pingreturns404error oncandlepincandlepinlogs having exceptions in sending messages toqpidbroker
Resolution
To resolve the problem
Ensure virt-who is not misconfigured and sending subscription status too frequently to Satellite / candlepin. See this solution for more.
Ensure foreman tasks have the ListenOnCandlepinEvents task running (in WebUI, go to Monitor -> Tasks -> apply filter "label = Actions::Candlepin::ListenOnCandlepinEvents" and check if there is exactly one task running).
Only on Satellite 6.1, ensure further gutterball is running properly by inspecting /var/log/gutterball/gutterball.log file.
To alleviate the problem
Optionally, after/in parallel to identifying the root cause, temporarily allow qpid broker to buffer more messages from candlepin to katello (or in Sat6.1, also audit logs that candlepin sends to gutterbal) via the broker.
-
Add to /etc/qpid/qpidd.conf:
default-queue-limit=0 -
Apply the change:
service qpidd restart -
Monitor
qpiddprocess memory usage, since it can take a lot due to the necessity to keep in memory much more messages for thecandlepin->katello(orcandlepin->gutterballin Sat6.1) communication. In case it consumes too much memory, revert the change to prevent out-of-memory issues. -
Monitor message depth of
qpidbroker queuekatello_event_queue(or also$(hostname -f):eventon Sat6.1) that will buffer the messages / audits fromcandlepintokatello(or also togutterballon Sat6.1):qpid-stat --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 -q katello_event_queue qpid-stat --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 -q $(hostname -f):eventwhere
queue-depthshould be decreasing over the time. If it is not, then the root cause has not been resolved yet.
For more KB articles/solutions related to Red Hat Satellite 6.x Candlepin Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Candlepin Issues
Root Cause
Candlepin failed to send subscription event to katello (or also an audit log to gutterball in Sat6.1) via qpid broker that has buffered maximal number of messages in its relevant queue. This can be caused by either foreman-tasks not running ListenOnCandlepinEvents task, (or also gutterball not working properly in Sat6.1), or candlepin sending the audits too frequently - this usually means some virt-who instance reports subscriptions status too often.
The root cause of the problem must be fixed to resolve the issue, while increasing maximal queue depth can alleviate the problem in temporarily restoring candlepin operation.
Diagnostic Steps
-
hammer pingreturns:candlepin: Status: FAIL Server Response: Message: 404 Resource Not Found candlepin_auth: Status: FAIL Server Response: Message: Katello::Resources::Candlepin::CandlepinPing: 404 Resource Not Found <html>... (skipping some generic 404 html page details) </html> (GET /candlepin/status) pulp: Status: ok Server Response: Duration: 61ms pulp_auth: Status: ok Server Response: Duration: 27ms elasticsearch: Status: ok Server Response: Duration: 35ms foreman_tasks: Status: ok Server Response: Duration: 1ms -
qpid-stat -qhavingkatello_event_queue(or also$(hostname -f):eventqueue on Sat6.1) large backlog of messages (in thousands, like11.5kin below example) and mainly zero consumers (2nd column from the right is zero):katello_event_queue Y 11.5k 11.5k 0 95.4m 95.4m 0 0 2(optionally, check that
qpid-stat -ulisting queues subscriptions/consumers does not have a consumer of that queue).Both
qpid-statoutputs are in collected byforeman-debug(see filesqpid_stat_queuesandqpid_stat_subscriptions). -
/var/log/candlepin/candlepin.loghaving eitherException when sending message:timed out waiting for completionorError sending event to message bus(backtrace details might differ in various Satellite / candlepin versions):2016-02-09 14:31:08,841 [=, org=] ERROR org.candlepin.audit.AMQPBusPublisher - Unable to send event: Event [id=null, target=COMPLIANCE, type=CREATED, time=Mon Dec 07 15:26:57 GMT 2015, entity=8a2e8c9851444d5601517d0d75c35615] javax.jms.JMSException: Exception when sending message:timed out waiting for completion at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:260) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:528) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.client.BasicMessageProducer.send(BasicMessageProducer.java:326) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.client.TopicPublisherAdapter.send(TopicPublisherAdapter.java:120) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:60) ~[AMQPBusPublisher.class:na] at org.candlepin.audit.ListenerWrapper.onMessage(ListenerWrapper.java:52) [ListenerWrapper.class:na] at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1114) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2] at org.hornetq.core.client.impl.ClientConsumerImpl.access$500(ClientConsumerImpl.java:57) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2] at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1249) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2] at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_95] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_95] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_95] Caused by: org.apache.qpid.transport.SessionException: timed out waiting for completion at org.apache.qpid.transport.Session.invoke(Session.java:747) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.transport.Session.invoke(Session.java:627) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.transport.SessionInvoker.messageTransfer(SessionInvoker.java:93) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:246) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] ... 12 common frames omittedor:
Jan 13 13:54:34 satellite.example.com server[11208]: ERROR [client] - HQ214000: Failed to call onMessage Jan 13 13:54:34 satellite.example.com server[11208]: java.lang.RuntimeException: Error sending event to message bus Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:69) Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.ListenerWrapper.onMessage(ListenerWrapper.java:52) Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1117) Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl.access$500(ClientConsumerImpl.java:57) Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1252) Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:107) Jan 13 13:54:34 satellite.example.com server[11208]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) Jan 13 13:54:34 satellite.example.com server[11208]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) Jan 13 13:54:34 satellite.example.com server[11208]: at java.lang.Thread.run(Thread.java:745) Jan 13 13:54:34 satellite.example.com server[11208]: Caused by: javax.jms.JMSException: Exception when sending message:timed out waiting for sync: comple Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:260) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:528) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer.send(BasicMessageProducer.java:326) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.TopicPublisherAdapter.send(TopicPublisherAdapter.java:120) Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:60) Jan 13 13:54:34 satellite.example.com server[11208]: ... 8 more Jan 13 13:54:34 satellite.example.com server[11208]: Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 151770 Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.transport.Session.sync(Session.java:867) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.transport.Session.sync(Session.java:837) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:253) Jan 13 13:54:34 satellite.example.com server[11208]: ... 12 more
Only on Satellite 6.1 with gutterball triggering the problem:
-
/var/lib/candlepin/hornetq/largemsgsdirectory having nontrivial content (backlog ofcandlepinaudits pending to be sent viaqpiddtogutterball):# du -hs /var/lib/candlepin/hornetq/journal /var/lib/candlepin/hornetq/largemsgs 31M /var/lib/candlepin/hornetq/journal 215M /var/lib/candlepin/hornetq/largemsgs #
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
1 Comments
It would be useful if there was some sort of "Health Status" monitor in the Web UI that had these scripts either be executed via click or ran via quartz driven events or cron and reported a status.