Random Job template execution errors with the message: "Exception: project_update 12345 (failed) encountered an error (rc=None), please see task stdout for details."

Solution Verified - Updated -

Environment

  • Red Hat Ansible Automation Platform 2.4
  • Red Hat Ansible Automation Controller

Issue

  • When executing a job template, it randomly fails with no apparent cause when running a different controller than the one it was initially run on.

  • The following error can be observed on URL https://<CONTROLLER_FQDN>/#/jobs/project/<PROJECT_UPDATE_JOB_ID>/output:

    PLAY [Update source tree if necessary] *****************************************
    
    TASK [Update project using git] ************************************************
    fatal: [localhost]: FAILED! => {"changed": false, "cmd": ["/usr/bin/git", "fetch", "--tags", "origin", "refs/heads/*:refs/remotes/origin/*"], "msg": "Failed to download remote objects and refs:  From https://[...]\n ! [rejected]        main       -> origin/main  (non-fast-forward)\n"}
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
    

Resolution

  • Please contact Red Hat support for more information on this, this issue is tracked internally as AAP-23362.

  • In the meantime, as a workaround, you can disable all controllers but one and run a sync with the delete on update and then repeat it for every controllers to force a full re-sync.
    The web UI can be found here: https://<CONTROLLER_FQDN>/#/instances

Root Cause

  • When a project is synced, it is only done on one controller (launch_type is manual), then when a job is scheduled on another controller, it will sync the project at runtime if needed (launch_type is sync).

  • If the delete on update was set, a delete tag is added to the sync job but it is not conveyed on the automatic sync on other controllers which can lead to an error if, for example, the git history is not fast forward anymore.

Diagnostic Steps

  • Check the project update list endpoint using: https://<CONTROLLER_FQDN>/api/v2/project_updates/?order_by=-created&name=<PROJECT_NAME>.

  • Job failed:

    {
    "id": 12345,
    "type": "project_update",
    "url": "/api/v2/project_updates/12345/",
    [...]
    "launch_type": "sync",                                                  # <---- Launch_type *sync*
    "status": "failed",                                                     # <---- Job status
    "project": 1,
    "job_type": "run",
    "job_tags": "update_git,install_roles,install_collections",             # <---- Missing *delete* tag
    [...]
    }
    

    Job successful:

    {
    "id": 12344,
    "type": "project_update",
    "url": "/api/v2/project_updates/12344/",
    [...]
    "launch_type": "manual",                                                # <---- Launch_type *manual*
    "status": "successful",                                                 # <---- Job status
    "project": 1,
    "job_type": "run",
    "job_tags": "update_git,install_roles,install_collections,delete",      # <---- *delete* tag is present
    [...]
    }
    

NOTE: CONTROLLER_FQDN stands for the Controller Node's Fully Qualified Domain Name, PROJECT_UPDATE_JOB_ID is the Job id and PROJECT_NAME is the project name of the affected Project Update.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments