Satellite Unresponsive when Candlepin Filesystem Hits Mask Disk Usage, Artemis Queue Blocked after upgrade to Red Hat Satellite 6
Issue
- Clients are timing out when communicating/registering to the Satellite server
- Httpd processes have increased significantly on the Satellite server
-
Satellite was recently upgraded to 6.8
The upgrade went smoothly, everything was working well but after some time, all services are timing out.
If I restart satellite services I can log in initially and hammer commands work, after a few minutes it becomes unresponsive again -
httpd service intermittently failing:
Dec 15 09:59:38 sat6 systemd: httpd.service stop-sigterm timed out. Killing. Dec 15 09:59:38 sat6 systemd: httpd.service: main process exited, code=killed, status=9/KILL Dec 15 09:59:38 sat6 systemd: Stopped The Apache HTTP Server.
-
Candlepin stops working by getting
Connection reset by peer - SSL_connect
error and Satellite's WebGUI stops working. Following errors logged in the log file/var/log/foreman/production.log
.2020-11-11T10:41:33 [I|app|7b25ab57] Started GET "/rhsm/status/" for IP_ADDR at 2020-11-11 10:41:33 -0600 2020-11-11T10:41:33 [I|app|7b25ab57] Processing by Katello::Api::Rhsm::CandlepinProxiesController#server_status as JSON 2020-11-11T10:41:33 [D|kat|7b25ab57] Resource GET request: /candlepin/status 2020-11-11T10:41:33 [D|kat|7b25ab57] Headers: {} 2020-11-11T10:41:33 [D|kat|7b25ab57] Body: {} 2020-11-11T10:41:33 [D|app|7b25ab57] RestClient.get "https://localhost:8443/candlepin/status", "Accept"=>"*/*", "Accept-Encoding"=>"gzip, deflate", "Authorization"=>"OAuth oauth_consumer_key=\"katello\", oauth_nonce=\"LWdta8OWW8w9kzAQi1CrbBiliDehntk5RwqCN5I0I\", oauth_signature=\"dclvfvAG%2Fbw1qEwhixs7VEf5j2s%3D\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"1605112893\", oauth_version=\"1.0\"", "User-Agent"=>"rest-client/2.0.2 (linux-gnu x86_64) ruby/2.5.5p157" 2020-11-11T11:18:49 [E|kat|7b25ab57] Errno::ECONNRESET: Connection reset by peer - SSL_connect 2020-11-11T11:18:50 [I|app|7b25ab57] Completed 500 Internal Server Error in 2236198ms (Views: 41.1ms | ActiveRecord: 3.3ms | Allocations: 50666) 2020-11-11T11:18:50 [D|app|7b25ab57] With body: {"displayMessage":"Connection reset by peer - SSL_connect","errors":["Connection reset by peer - SSL_connect"]}
Environment
- Red Hat Satellite 6.8+
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.