candlepin not working on Red Hat Satellite and hammer ping returns 404 on candlepin
Environment
- Red Hat Satellite 6
Issue
hammer ping
returns404
error oncandlepin
candlepin
logs having exceptions in sending messages toqpid
broker
Resolution
To resolve the problem
Ensure virt-who
is not misconfigured and sending subscription status too frequently to Satellite / candlepin. See this solution for more.
Ensure foreman tasks have the ListenOnCandlepinEvents
task running (in WebUI, go to Monitor -> Tasks -> apply filter "label = Actions::Candlepin::ListenOnCandlepinEvents" and check if there is exactly one task running).
Only on Satellite 6.1, ensure further gutterball
is running properly by inspecting /var/log/gutterball/gutterball.log
file.
To alleviate the problem
Optionally, after/in parallel to identifying the root cause, temporarily allow qpid
broker to buffer more messages from candlepin
to katello
(or in Sat6.1, also audit logs that candlepin
sends to gutterbal
) via the broker.
-
Add to /etc/qpid/qpidd.conf:
default-queue-limit=0
-
Apply the change:
service qpidd restart
-
Monitor
qpidd
process memory usage, since it can take a lot due to the necessity to keep in memory much more messages for thecandlepin->katello
(orcandlepin->gutterball
in Sat6.1) communication. In case it consumes too much memory, revert the change to prevent out-of-memory issues. -
Monitor message depth of
qpid
broker queuekatello_event_queue
(or also$(hostname -f):event
on Sat6.1) that will buffer the messages / audits fromcandlepin
tokatello
(or also togutterball
on Sat6.1):qpid-stat --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 -q katello_event_queue qpid-stat --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 -q $(hostname -f):event
where
queue-depth
should be decreasing over the time. If it is not, then the root cause has not been resolved yet.
For more KB articles/solutions related to Red Hat Satellite 6.x Candlepin Issues, please refer to the Consolidated Troubleshooting Article for Red Hat Satellite 6.x Candlepin Issues
Root Cause
Candlepin
failed to send subscription event to katello
(or also an audit log to gutterball
in Sat6.1) via qpid
broker that has buffered maximal number of messages in its relevant queue. This can be caused by either foreman-tasks
not running ListenOnCandlepinEvents
task, (or also gutterball
not working properly in Sat6.1), or candlepin
sending the audits too frequently - this usually means some virt-who
instance reports subscriptions status too often.
The root cause of the problem must be fixed to resolve the issue, while increasing maximal queue depth can alleviate the problem in temporarily restoring candlepin
operation.
Diagnostic Steps
-
hammer ping
returns:candlepin: Status: FAIL Server Response: Message: 404 Resource Not Found candlepin_auth: Status: FAIL Server Response: Message: Katello::Resources::Candlepin::CandlepinPing: 404 Resource Not Found <html>... (skipping some generic 404 html page details) </html> (GET /candlepin/status) pulp: Status: ok Server Response: Duration: 61ms pulp_auth: Status: ok Server Response: Duration: 27ms elasticsearch: Status: ok Server Response: Duration: 35ms foreman_tasks: Status: ok Server Response: Duration: 1ms
-
qpid-stat -q
havingkatello_event_queue
(or also$(hostname -f):event
queue on Sat6.1) large backlog of messages (in thousands, like11.5k
in below example) and mainly zero consumers (2nd column from the right is zero):katello_event_queue Y 11.5k 11.5k 0 95.4m 95.4m 0 0 2
(optionally, check that
qpid-stat -u
listing queues subscriptions/consumers does not have a consumer of that queue).Both
qpid-stat
outputs are in collected byforeman-debug
(see filesqpid_stat_queues
andqpid_stat_subscriptions
). -
/var/log/candlepin/candlepin.log
having eitherException when sending message:timed out waiting for completion
orError sending event to message bus
(backtrace details might differ in various Satellite / candlepin versions):2016-02-09 14:31:08,841 [=, org=] ERROR org.candlepin.audit.AMQPBusPublisher - Unable to send event: Event [id=null, target=COMPLIANCE, type=CREATED, time=Mon Dec 07 15:26:57 GMT 2015, entity=8a2e8c9851444d5601517d0d75c35615] javax.jms.JMSException: Exception when sending message:timed out waiting for completion at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:260) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:528) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.client.BasicMessageProducer.send(BasicMessageProducer.java:326) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.client.TopicPublisherAdapter.send(TopicPublisherAdapter.java:120) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:60) ~[AMQPBusPublisher.class:na] at org.candlepin.audit.ListenerWrapper.onMessage(ListenerWrapper.java:52) [ListenerWrapper.class:na] at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1114) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2] at org.hornetq.core.client.impl.ClientConsumerImpl.access$500(ClientConsumerImpl.java:57) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2] at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1249) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2] at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_95] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_95] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_95] Caused by: org.apache.qpid.transport.SessionException: timed out waiting for completion at org.apache.qpid.transport.Session.invoke(Session.java:747) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.transport.Session.invoke(Session.java:627) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.transport.SessionInvoker.messageTransfer(SessionInvoker.java:93) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1] at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:246) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1] ... 12 common frames omitted
or:
Jan 13 13:54:34 satellite.example.com server[11208]: ERROR [client] - HQ214000: Failed to call onMessage Jan 13 13:54:34 satellite.example.com server[11208]: java.lang.RuntimeException: Error sending event to message bus Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:69) Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.ListenerWrapper.onMessage(ListenerWrapper.java:52) Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1117) Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl.access$500(ClientConsumerImpl.java:57) Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1252) Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:107) Jan 13 13:54:34 satellite.example.com server[11208]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) Jan 13 13:54:34 satellite.example.com server[11208]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) Jan 13 13:54:34 satellite.example.com server[11208]: at java.lang.Thread.run(Thread.java:745) Jan 13 13:54:34 satellite.example.com server[11208]: Caused by: javax.jms.JMSException: Exception when sending message:timed out waiting for sync: comple Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:260) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:528) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer.send(BasicMessageProducer.java:326) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.TopicPublisherAdapter.send(TopicPublisherAdapter.java:120) Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:60) Jan 13 13:54:34 satellite.example.com server[11208]: ... 8 more Jan 13 13:54:34 satellite.example.com server[11208]: Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 151770 Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.transport.Session.sync(Session.java:867) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.transport.Session.sync(Session.java:837) Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:253) Jan 13 13:54:34 satellite.example.com server[11208]: ... 12 more
Only on Satellite 6.1 with gutterball triggering the problem:
-
/var/lib/candlepin/hornetq/largemsgs
directory having nontrivial content (backlog ofcandlepin
audits pending to be sent viaqpidd
togutterball
):# du -hs /var/lib/candlepin/hornetq/journal /var/lib/candlepin/hornetq/largemsgs 31M /var/lib/candlepin/hornetq/journal 215M /var/lib/candlepin/hornetq/largemsgs #
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments