HAProxy does not restart during RHOSP upgrade when RHCS is director-deployed and RGW is enabled

Solution In Progress - Updated -

Environment

  • Red Hat Openstack Platform (RHOSP) 17.1
  • Red Hat Ceph Storage (RHCS) 5

Issue

  • During the upgrade from RHOSP 16.2 to 17.1, when RGW is deployed as part of director-deployed Red Hat Ceph Storage, the procedure fails when HAProxy does not restart on the next stack update.

Resolution

Apply the following workaround to address the issue:

  1. Log in to the undercloud host as the stack user.

  2. Source the stackrc undercloud credentials file:

    $ source ~/stackrc
    
  3. Log in to a Controller node and create the following file:

    $ cat <<EOF>rgw_spec
    ---
    service_type: rgw
    service_id: rgw
    service_name: rgw.rgw
    placement:
      hosts:
      - controller-0
      - controller-1
      - controller-2
    networks:
    - <172.17.3.0/24>
    spec:
      rgw_frontend_port: 8080
      rgw_realm: default
      rgw_zone: default
    EOF
    
    • Replace the network 172.17.3.0/24 with the subnet assigned to the storage network.
  4. As root user, run the cephadm shell. Remove the adopted RGW daemons and apply the spec created in step 3:

    $ cephadm shell -m rgw_spec
    $ ceph orch apply -i /mnt/rgw_spec
    
  5. Remove the adopted RGW from the Ceph Storage cluster:

    $ for i in 0 1 2; do
        ceph orch rm rgw.controller-$i;
      done
    
    • Exit the cephadm shell.
  6. As root user, stop HAProxy to point to the new Ceph RGW daemons:

    $ pcs resource unmanage haproxy-bundle
    $ pcs resource disable haproxy-bundle
    $ pcs resource manage haproxy-bundle
    
  7. Verify that the three RGW instances are up and running:

    $ cephadm shell -- ceph orch ps | grep rgw
    
  8. As root user, re-enable HAProxy through Pacemaker:

    $ pcs resource enable haproxy-bundle
    

Root Cause

The HAProxy bundle cannot start through Pacemaker because a failure occurs when it attempts to bind to the RGW port (8080). The failure occurs because RGW has not been redeployed on the storage network. BZ 2224351

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments