[Satellite6] candlepin not working, hammer ping returns 404 on candlepin
Environment
- Red Hat Satellite 6
Issue
- various actions related to subscriptions (like content host register, managing manifest or so) are failing
hammer pingreturns404error oncandlepincandlepinlogs having exceptions in sending messages toqpidbroker
Resolution
To resolve the problem
Ensure virt-who is not misconfigured and sending subscription status too frequently to Satellite / candlepin. See this solution for more.
Ensure foreman tasks have the ListenOnCandlepinEvents task running (in WebUI, go to Monitor -> Tasks -> apply filter "label = Actions::Candlepin::ListenOnCandlepinEvents" and check if there is exactly one task running).
Only on Satellite 6.1, ensure further gutterball is running properly by inspecting /var/log/gutterball/gutterball.log file.
To alleviate the problem
Optionally, after/in parallel to identifying the root cause, temporarily allow qpid broker to buffer more messages from candlepin to katello (or in Sat6.1, also audit logs that candlepin sends to gutterbal) via the broker.
- Add to /etc/qpid/qpidd.conf:
default-queue-limit=0
- Apply the change:
service qpidd restart
-
Monitor
qpiddprocess memory usage, since it can take a lot due to the necessity to keep in memory much more messages for thecandlepin->katello(orcandlepin->gutterballin Sat6.1) communication. In case it consumes too much memory, revert the change to prevent out-of-memory issues. -
Monitor message depth of
qpidbroker queuekatello_event_queue(or also$(hostname -f):eventon Sat6.1) that will buffer the messages / audits fromcandlepintokatello(or also togutterballon Sat6.1):
qpid-stat --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 -q katello_event_queue
qpid-stat --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 -q $(hostname -f):event
where queue-depth should be decreasing over the time. If it is not, then the root cause has not been resolved yet.
Root Cause
Candlepin failed to send subscription event to katello (or also an audit log to gutterball in Sat6.1) via qpid broker that has buffered maximal number of messages in its relevant queue. This can be caused by either foreman-tasks not running ListenOnCandlepinEvents task, (or also gutterball not working properly in Sat6.1), or candlepin sending the audits too frequently - this usually means some virt-who instance reports subscriptions status too often.
The root cause of the problem must be fixed to resolve the issue, while increasing maximal queue depth can alleviate the problem in temporarily restoring candlepin operation.
Diagnostic Steps
hammer pingreturns:
candlepin:
Status: FAIL
Server Response: Message: 404 Resource Not Found
candlepin_auth:
Status: FAIL
Server Response: Message: Katello::Resources::Candlepin::CandlepinPing: 404 Resource Not Found <html>... (skipping some generic 404 html page details) </html> (GET /candlepin/status)
pulp:
Status: ok
Server Response: Duration: 61ms
pulp_auth:
Status: ok
Server Response: Duration: 27ms
elasticsearch:
Status: ok
Server Response: Duration: 35ms
foreman_tasks:
Status: ok
Server Response: Duration: 1ms
qpid-stat -q having katello_event_queue (or also $(hostname -f):event queue on Sat6.1) large backlog of messages (in thousands, like 11.5k in below example) and mainly zero consumers (2nd column from the right is zero):
katello_event_queue Y 11.5k 11.5k 0 95.4m 95.4m 0 0 2
(optionally, check that qpid-stat -u listing queues' subscriptions/consumers does not have a consumer of that queue).
Both qpid-stat outputs are in collected by foreman-debug (see files qpid_stat_queues and qpid_stat_subscriptions).
/var/log/candlepin/candlepin.loghaving eitherException when sending message:timed out waiting for completionorError sending event to message bus(backtrace details might differ in various Satellite / candlepin versions):
2016-02-09 14:31:08,841 [=, org=] ERROR org.candlepin.audit.AMQPBusPublisher - Unable to send event: Event [id=null, target=COMPLIANCE, type=CREATED, time=Mon Dec 07 15:26:57 GMT 2015, entity=8a2e8c9851444d5601517d0d75c35615]
javax.jms.JMSException: Exception when sending message:timed out waiting for completion
at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:260) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1]
at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:528) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1]
at org.apache.qpid.client.BasicMessageProducer.send(BasicMessageProducer.java:326) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1]
at org.apache.qpid.client.TopicPublisherAdapter.send(TopicPublisherAdapter.java:120) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1]
at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:60) ~[AMQPBusPublisher.class:na]
at org.candlepin.audit.ListenerWrapper.onMessage(ListenerWrapper.java:52) [ListenerWrapper.class:na]
at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1114) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2]
at org.hornetq.core.client.impl.ClientConsumerImpl.access$500(ClientConsumerImpl.java:57) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2]
at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1249) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2]
at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.5.Final-redhat-2.jar:2.3.5.Final-redhat-2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_95]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_95]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_95]
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for completion
at org.apache.qpid.transport.Session.invoke(Session.java:747) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1]
at org.apache.qpid.transport.Session.invoke(Session.java:627) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1]
at org.apache.qpid.transport.SessionInvoker.messageTransfer(SessionInvoker.java:93) ~[qpid-common-0.30.redhat-1.jar:0.30.redhat-1]
at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:246) ~[qpid-client-0.30.redhat-1.jar:0.30.redhat-1]
... 12 common frames omitted
or:
Jan 13 13:54:34 satellite.example.com server[11208]: ERROR [client] - HQ214000: Failed to call onMessage
Jan 13 13:54:34 satellite.example.com server[11208]: java.lang.RuntimeException: Error sending event to message bus
Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:69)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.ListenerWrapper.onMessage(ListenerWrapper.java:52)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1117)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl.access$500(ClientConsumerImpl.java:57)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1252)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:107)
Jan 13 13:54:34 satellite.example.com server[11208]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Jan 13 13:54:34 satellite.example.com server[11208]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Jan 13 13:54:34 satellite.example.com server[11208]: at java.lang.Thread.run(Thread.java:745)
Jan 13 13:54:34 satellite.example.com server[11208]: Caused by: javax.jms.JMSException: Exception when sending message:timed out waiting for sync: comple
Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:260)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer.sendImpl(BasicMessageProducer.java:528)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer.send(BasicMessageProducer.java:326)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.TopicPublisherAdapter.send(TopicPublisherAdapter.java:120)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.candlepin.audit.AMQPBusPublisher.onEvent(AMQPBusPublisher.java:60)
Jan 13 13:54:34 satellite.example.com server[11208]: ... 8 more
Jan 13 13:54:34 satellite.example.com server[11208]: Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 151770
Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.transport.Session.sync(Session.java:867)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.transport.Session.sync(Session.java:837)
Jan 13 13:54:34 satellite.example.com server[11208]: at org.apache.qpid.client.BasicMessageProducer_0_10.sendMessage(BasicMessageProducer_0_10.java:253)
Jan 13 13:54:34 satellite.example.com server[11208]: ... 12 more
Only on Satellite 6.1 with gutterball triggering the problem:
/var/lib/candlepin/hornetq/largemsgsdirectory having nontrivial content (backlog ofcandlepinaudits pending to be sent viaqpiddtogutterball):
# du -hs /var/lib/candlepin/hornetq/journal /var/lib/candlepin/hornetq/largemsgs
31M /var/lib/candlepin/hornetq/journal
215M /var/lib/candlepin/hornetq/largemsgs
#
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
