How To Kill Bulk Running Workflow Jobs in Ansible Automation Platform?
Environment
- Ansible Automation Platform 1.2.X.
- Ansible Automation Platform 2.X.X.
Issue
- User mistakenly created 5700+ jobs through api and was not able to kill Workflow jobs using tower_cli command.
Resolution
-
Run the below command to cancel the running workflow jobs:
# su - awx $ echo "UnifiedJob.objects.filter(status='running').update(status='canceled')" | awx-manage shell_plus
-
After all the workflow jobs are canceled, please restart all Ansible Automation platform nodes using the following command :
* For Ansible Tower (AAP1.2) use the following command: $ ansible-tower-service restart * For Automation Controller Node (AAP2.x) Use the Following Command $ automation-controller-service restart
Root Cause
-
Running Workflow jobs are not easy to cancel using simple awx-manage shell_plus command.
The process takes too longExample: ``` >>> start = time.time(); UnifiedJob.objects.filter(id=1096679).update(status='canceled'); end = time.time(); print(end - start) 1 263.3683326244354
With this speed, it would take approximately 2 weeks to cancel 4500+ running workflow jobs.
Diagnostic Steps
-
Command to cancel the running ansible jobs returned ok but was not effective on workflow jobs and they were still in running mode.
# tower-cli workflow_job list -W 2389 --status running ======= ===================== =========================== ======= id workflow_job_template created status ======= ===================== =========================== ======= 1096571 2389 2022-09-26T13:20:39.150265Z running 1096573 2389 2022-09-26T13:20:43.415625Z running 1096574 2389 2022-09-26T13:20:47.948050Z running 1096579 2389 2022-09-26T13:20:52.186344Z running 1096580 2389 2022-09-26T13:20:56.741558Z running 1096583 2389 2022-09-26T13:21:00.827540Z running 1096584 2389 2022-09-26T13:21:05.154718Z running 1096585 2389 2022-09-26T13:21:09.275868Z running 1096588 2389 2022-09-26T13:21:13.725400Z running 1096590 2389 2022-09-26T13:21:18.275462Z running 1096594 2389 2022-09-26T13:21:22.411275Z running 1096595 2389 2022-09-26T13:21:26.738721Z running 1096599 2389 2022-09-26T13:21:31.279409Z running 1096603 2389 2022-09-26T13:21:35.386969Z running 1096604 2389 2022-09-26T13:21:39.628363Z running 1096607 2389 2022-09-26T13:21:43.820806Z running 1096608 2389 2022-09-26T13:21:47.898231Z running 1096612 2389 2022-09-26T13:21:52.032458Z running 1096613 2389 2022-09-26T13:21:56.383005Z running 1096614 2389 2022-09-26T13:22:00.596535Z running 1096618 2389 2022-09-26T13:22:05.834738Z running 1096619 2389 2022-09-26T13:22:09.969598Z running 1096623 2389 2022-09-26T13:22:14.080368Z running 1096627 2389 2022-09-26T13:22:18.412507Z running 1096628 2389 2022-09-26T13:22:22.671783Z running ======= ===================== =========================== ======= (Page 1 of 193.) # for ID in $(tower-cli workflow_job list -W 2389 --status running | awk '/running/ {print $1}'); do tower-cli workflow_job cancel $ID; done OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) OK. (changed: true) # tower-cli workflow_job list -W 2389 --status running ======= ===================== =========================== ======= id workflow_job_template created status ======= ===================== =========================== ======= 1096571 2389 2022-09-26T13:20:39.150265Z running 1096573 2389 2022-09-26T13:20:43.415625Z running 1096574 2389 2022-09-26T13:20:47.948050Z running 1096579 2389 2022-09-26T13:20:52.186344Z running 1096580 2389 2022-09-26T13:20:56.741558Z running 1096583 2389 2022-09-26T13:21:00.827540Z running 1096584 2389 2022-09-26T13:21:05.154718Z running 1096585 2389 2022-09-26T13:21:09.275868Z running 1096588 2389 2022-09-26T13:21:13.725400Z running 1096590 2389 2022-09-26T13:21:18.275462Z running 1096594 2389 2022-09-26T13:21:22.411275Z running 1096595 2389 2022-09-26T13:21:26.738721Z running 1096599 2389 2022-09-26T13:21:31.279409Z running 1096603 2389 2022-09-26T13:21:35.386969Z running 1096604 2389 2022-09-26T13:21:39.628363Z running 1096607 2389 2022-09-26T13:21:43.820806Z running 1096608 2389 2022-09-26T13:21:47.898231Z running 1096612 2389 2022-09-26T13:21:52.032458Z running 1096613 2389 2022-09-26T13:21:56.383005Z running 1096614 2389 2022-09-26T13:22:00.596535Z running 1096618 2389 2022-09-26T13:22:05.834738Z running 1096619 2389 2022-09-26T13:22:09.969598Z running 1096623 2389 2022-09-26T13:22:14.080368Z running 1096627 2389 2022-09-26T13:22:18.412507Z running 1096628 2389 2022-09-26T13:22:22.671783Z running ======= ===================== =========================== ======= (Page 1 of 193.)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments