Publishing & Promoting Content View Stuck

Latest response

We've had a "Publishing and promoting to 1 environment" on a content view stuck for about two weeks.

I created a new content view based on the stuck one via the "Copy View" option, but cannot figure out how to kill the backend task and locks for the "stuck" view in order to unregister content hosts from it and move on.

(1) Is there a way to kill the tasks and get the publish/promote to fail so we can remove iot
(2) Is there a way to clean-up all the backend tasks that are pending or failed other than clean backend objects which doesn't appear to actually kill pending ones and locks?

Responses

Bradley,

Have you had any success with killing the hung content view publishing? I am in the same boat with a view publishing stuck at 42% - can't seem to kill it (also need to investigate why it is hanging, but that is a different story).

Yeah we ended up killing off the task promotion per the below ... let me also review our cases and see what we put in there as the official fix.

Note that the root cause of our issues was OOM-killer was killing off processes related to the content promotion, however the solution where clearing paused, error, tasks via Foreman will clear it out -- but the version provided below will clear all, you can pass more specific values to ForemanTasks to only kill that which is the Content Promotion, or use the Monitor -> Tasks Web UI to clear it out.

Ultimately though ... check messages for any OOM killer issues, there was also a bug opened, but it was fixed in 6.1.1 ... since upgrading to 6.1.8 we haven't seen this issue again and increasing RAM to 24G.

We often forget to check those simple things when using satellite thinking it's some complex foreman or elasticsearch indices problem ... but I've often seen OOM issues and postGreSQL issues that a simple vacuum/reindex fixed or increasing RAM helped.

Bradley,

Which process was consuming the RAM to cause the invocation of the OOM killer (ie. a Satellite process or another process on the server)? Is there an identified memory leak/issue in the bug you mention?, or was it just the number/size of concurrent content view promotion tasks you had running?

I don't know if this is the best way or not, but this allowed me to kill them off and re-publish.

foreman-rake console

irb(main):001:0> ForemanTasks::Task.where(:state => :paused).destroy_all

irb(main):002:0> exit

foreman-rake katello:reindex

Yes, this helped to solve the issue!!

This thread is pretty old. In the latest version of Satellite (6.4.1) I have not needed to do this. I have simply cancelled the task in the web UI then shut down and restarted Satellite using foreman-maintain. Unfortunately, I have had to do this a couple of times since upgrading to 6.4.1. I should probably have opened a ticket with Red Hat to investigate.