Resolving oo-admin-chk reports in OpenShift Enterprise

Solution Unverified - Updated -

Environment

  • OpenShift Enterprise 1.2, 2.x

Issue

  • When running oo-admin-chk I see that I have more gears in mongo than what I have reported by my nodes, what do I do to resolve this.
    • oo-admin-repair commands do not seem to fix this issue.
# oo-admin-chk
Started at: YYYY-MM-DD HH:MM:SS -0GMT
Time to fetch mongo data: 0.XXXs
Total gears found in mongo: 1
Time to get all gears from nodes: XX.000s
Total gears found on the nodes: 0
Total nodes that responded : X
Check failed.
Gear GEAR_UUID does not exist on any node
Please refer to the oo-admin-repair tool to resolve some of these inconsistencies.
Total time: XX.XXXs
Finished at: YYYY-MM-DD HH:MM:SS -0GMT
  • I also do not see the gear on any of my nodes?

Resolution

  1. To resolve this issue first confirm that the the GEAR with the GEAR_UUID does not exist on any of your nodes (see Diagnostic Steps).
  2. Use oo-app-info script to get the user, and app name from the mongo database.
    # oo-app-info -u GEAR_UUID
  1. Force delete (destroy) the application and gear
    # oo-admin-ctl-app -l USER -a APP_NAME -c force-destroy --gear_uuid GEAR_UUID

Root Cause

The reason you are seeing a gear in mongo that is missing on all nodes is likely because some operation or message was not handled properly (network problem, reboot, etc.) and the broker did not delete the record of a gear that no longer exists ("phantom" gear).

  • These simply need to be cleaned up by an admin from time to time.

The reason why the oo-admin-repair commands --consumed-gears and --ssh-keys do not work is because --consumed-gears is used to fix mismatch in user's consumed gears vs actual gears in mongo. In short it fixes issue related to uses have 10 of 10 consumed gears when only say 5 exists for this user. The --ssh-keys command fixes the mismatch in SSH keys between mongo and on the node for a gear, or makes sure that the ssh keys in mongo are correct and users can ssh into there gears.

Neither of these commands is designed to assist in situations where mongo records a gear but a node has no record of this.

Diagnostic Steps

  • Confirm that the the GEAR with the GEAR_UUID does not exist on any of your nodes (this should not return any data).
    • Note: you need to give the host name of each of your nodes as a space delimited list.
    for x in node1.domain.com node2.domain.com; do ssh $x ls /var/lib/openshift/; done | grep GEAR_UUID
  • Investigate the node in which the gear should exist and Verify with the mco log (by default /var/log/openshift/log/ruby193-mcollective.log on node) if a destroy or delete was called.
    • If not you could be seeing other issues.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments