Satellite Unresponsive when Candlepin Filesystem Hits Mask Disk Usage, Artemis Queue Blocked after upgrade to Red Hat Satellite 6

Solution Verified - Updated -

Issue

  • Clients are timing out when communicating/registering to the Satellite server
  • Httpd processes have increased significantly on the Satellite server
  • Satellite was recently upgraded to 6.8
    The upgrade went smoothly, everything was working well but after some time, all services are timing out.
    If I restart satellite services I can log in initially and hammer commands work, after a few minutes it becomes unresponsive again

  • httpd service intermittently failing:

    Dec 15 09:59:38 sat6 systemd: httpd.service stop-sigterm timed out. Killing.
    Dec 15 09:59:38 sat6 systemd: httpd.service: main process exited, code=killed, status=9/KILL
    Dec 15 09:59:38 sat6 systemd: Stopped The Apache HTTP Server.
    
  • Candlepin stops working by getting Connection reset by peer - SSL_connect error and Satellite's WebGUI stops working. Following errors logged in the log file /var/log/foreman/production.log.

    2020-11-11T10:41:33 [I|app|7b25ab57] Started GET "/rhsm/status/" for IP_ADDR at 2020-11-11 10:41:33 -0600
    2020-11-11T10:41:33 [I|app|7b25ab57] Processing by Katello::Api::Rhsm::CandlepinProxiesController#server_status as JSON
    2020-11-11T10:41:33 [D|kat|7b25ab57] Resource GET request: /candlepin/status
    2020-11-11T10:41:33 [D|kat|7b25ab57] Headers: {}
    2020-11-11T10:41:33 [D|kat|7b25ab57] Body: {}
    2020-11-11T10:41:33 [D|app|7b25ab57] RestClient.get "https://localhost:8443/candlepin/status", "Accept"=>"*/*", "Accept-Encoding"=>"gzip, deflate", "Authorization"=>"OAuth oauth_consumer_key=\"katello\", oauth_nonce=\"LWdta8OWW8w9kzAQi1CrbBiliDehntk5RwqCN5I0I\", oauth_signature=\"dclvfvAG%2Fbw1qEwhixs7VEf5j2s%3D\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"1605112893\", oauth_version=\"1.0\"", "User-Agent"=>"rest-client/2.0.2 (linux-gnu x86_64) ruby/2.5.5p157"
    2020-11-11T11:18:49 [E|kat|7b25ab57] Errno::ECONNRESET: Connection reset by peer - SSL_connect
    2020-11-11T11:18:50 [I|app|7b25ab57] Completed 500 Internal Server Error in 2236198ms (Views: 41.1ms | ActiveRecord: 3.3ms | Allocations: 50666)
    2020-11-11T11:18:50 [D|app|7b25ab57] With body: {"displayMessage":"Connection reset by peer - SSL_connect","errors":["Connection reset by peer - SSL_connect"]}
    

Environment

  • Red Hat Satellite 6.8+

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content