Workaround for Stuck Service Catalog Resources

Updated -

Problem

In some circumstances, the service catalog may not be able to fully delete service instances or service bindings if problems occur during deprovision or unbind. As a result, project or namespace deletion can be blocked.

Related bugs:

Symptom

An application is provisioned using the service catalog and service brokers (the Template Service Broker or OpenShift Ansible Broker). When the application or project is deleted, finalizers on the ServiceInstance and ServiceBinding resources prevent these resources from being fully deleted until the service catalog ensures that the deprovision and unbind operations, respectively, are completed by the broker that provides the service. Under normal circumstances, this can temporarily block the deletion of the namespace containing the service catalog resources. In the event that a problem occurs contacting the broker, or the broker reports an error with these operations, the service catalog will retry, and the full deletion of the service catalog resources will remain blocked until these operations complete successfully.

Diagnosis

If the deletion appears to be stuck, review the instance or binding with the describe command or by reviewing the service Catalog controller logs:

$ oc describe serviceinstances -n testproject

or

$ oc describe servicebindings -n testproject

Review the events status. Specifically, read the messages and errors and look for indications that the deletion is not progressing.

To view the controller logs:

$ oc logs controller-manager-7db89659fb-nznsk -n kube-service-catalog  | less

As with events and status, look for indications that deletion is not progressing. Generally, in a scenario like this you will see the same warning or error repeated over and over and usually the resource name is included in the log entry.

It is not generally advised, but if you have an issue similar to what is described here, you may delete the service catalog finalizer from the service instance or service binding. The finalizer kubernetes-incubator/service-catalog is used to represent that the service catalog still has work to perform for the resource before it can be fully deleted. By manually deleting the finalizer, the service catalog controller will do no more work for that resource, meaning that resources created by the service broker may not be cleaned up.

Procedure

  1. Identify the problematic resources by analyzing what is not being deleted (a namespace, application, or binding) and review the status of the service instances and service bindings for the given namespace:

    $ oc describe serviceinstances -n testproject
    

    or

    $ oc describe servicebindings -n testproject
    
  2. For each binding or instance that is in an error state, delete the service-catalog finalizer:

    $ oc edit servicebindings XXXX -n testproject
    
  3. Locate the following two lines, delete them, save, and exit the editor:

      finalizers:
      - kubernetes-incubator/service-catalog
    

Once the finalizers are removed, the objects should be deleted during the next reconciliation loop.

Background

It is fully expected and correct that when a namespace with a ServiceInstance (and possibly ServiceBindings) is deleted, that the namespace may not immediately be fully deleted until the catalog resources are deleted including resources created by the service broker.

When a ServiceBinding is deleted by the user, the ServiceBinding resource is not fully deleted immediately because the service catalog has to contact the broker and invoke the unbind operation at the broker. The finalizer kubernetes-incubator/service-catalog is used to represent that the service catalog still has work to perform for the ServiceBinding resource before that resource can be fully deleted. At this stage, if the broker is unreachable, or the broker itself has a bug, the ServiceBinding will remain in the catalog until either the unbind operation is retried and completes successfully at the broker (the catalog will retry operations that fail), in which case the catalog will remove the finalizer and the binding will be fully deleted, or the user manually removes the finalizer. When the user manually removes the finalizer, the catalog controller will do no more work for that resource, meaning that resources created by the broker may not be cleaned up.

Similarly, when a ServiceInstance is deleted by the user, the ServiceInstance resource is not fully deleted immediately because the catalog has to contact the broker and invoke the Deprovision operation at the broker. The finalizer kubernetes-incubator/service-catalog is used to represent that the service catalog still has work to perform for the ServiceInstance resource before that resource can be fully deleted. Note that if a ServiceInstance is deleted, and that ServiceInstance still has ServiceBindings remaining, the catalog will not contact the broker to deprovision the ServiceInstance until the ServiceBinding resource(s) are fully deleted from the catalog. Additionally, once the catalog contacts the broker to perform the deprovision, if the broker is unreachable or the broker itself has a bug, the ServiceInstance will remain in the catalog until either the deprovision operation completes successfully (the catalog will retry operations that fail) or the user manually removes the finalizer. When the user manually removes the finalizer, the catalog controller will do no more work for that resource, meaning that resources created by the broker may not be cleaned up.

Removing the finalizer should be considered a last resort if a broker is unreachable (permanently down), the broker has a bug that has resulted in the service catalog resource entering a bad state, or the catalog itself has a bug that prevents the resource from being fully deleted after the unbind/deprovision operation has completed successfully.

Under normal circumstances, the service catalog will issue requests to the service broker to delete resources and eventually the broker's resources will be freed up. The service catalog will reflect this by deleting its own associated metadata. You should only manually remove a finalizer if you are certain that the system is in a bad state.

Comments