Chapter 7. Investigating and Fixing HA Controller Resources

The pcs constraint show command shows any constraints on how services are launched. The output from the command shows constraints related to where each resource is located, the order in which it starts and whether it must be co-located with another resource. If there are any problems, you can try to fix those problems, then clean up the resources.

The following example shows a truncated output from pcs constraint show on a controller node:

$ sudo pcs constraint show
Location Constraints:
  Resource: galera-bundle
    Constraint: location-galera-bundle (resource-discovery=exclusive)
      Rule: score=0
        Expression: galera-role eq true
  [...]
  Resource: ip-192.168.24.15
    Constraint: location-ip-192.168.24.15 (resource-discovery=exclusive)
      Rule: score=0
        Expression: haproxy-role eq true
  [...]
  Resource: my-ipmilan-for-controller-0
    Disabled on: overcloud-controller-0 (score:-INFINITY)
  Resource: my-ipmilan-for-controller-1
    Disabled on: overcloud-controller-1 (score:-INFINITY)
  Resource: my-ipmilan-for-controller-2
    Disabled on: overcloud-controller-2 (score:-INFINITY)
Ordering Constraints:
  start ip-172.16.0.10 then start haproxy-bundle (kind:Optional)
  start ip-10.200.0.6 then start haproxy-bundle (kind:Optional)
  start ip-172.19.0.10 then start haproxy-bundle (kind:Optional)
  start ip-192.168.1.150 then start haproxy-bundle (kind:Optional)
  start ip-172.16.0.11 then start haproxy-bundle (kind:Optional)
  start ip-172.18.0.10 then start haproxy-bundle (kind:Optional)
Colocation Constraints:
  ip-172.16.0.10 with haproxy-bundle (score:INFINITY)
  ip-172.18.0.10 with haproxy-bundle (score:INFINITY)
  ip-10.200.0.6 with haproxy-bundle (score:INFINITY)
  ip-172.19.0.10 with haproxy-bundle (score:INFINITY)
  ip-172.16.0.11 with haproxy-bundle (score:INFINITY)
  ip-192.168.1.150 with haproxy-bundle (score:INFINITY)

This output displays three major constraint types:

Location Constraints

This section shows constraints that are related to where resources are assigned. The first constraint defines a rule that sets the galera-bundle resource to be run on nodes with the galera-role attribute set to true. You can check the attributes of the nodes is by using the pcs property show command:

$ sudo pcs property show
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: tripleo_cluster
 dc-version: 2.0.1-4.el8-0eb7991564
 have-watchdog: false
 redis_REPL_INFO: overcloud-controller-0
 stonith-enabled: false
Node Attributes:
 overcloud-controller-0: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@overcloud-controller-0
 overcloud-controller-1: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@overcloud-controller-1
 overcloud-controller-2: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@overcloud-controller-2

From this output, you can verify that the galera-role attribute is true for all the controllers. This means that the galera-bundle resource runs only on these nodes. The same concept applies to the other attributes associated with the other location constraints.

The second location constraint relates to the resource ip-192.168.24.15 and specifies that the IP resource runs only on nodes with the haproxy-role attribute set to true. This means that the cluster associates the IP address with the haproxy service, which is necessary to make the services reachable.

The third location constraint shows that the ipmilan resource is disabled on each of the controllers.

Ordering Constraints
This section shows the constraint that enforces the virtual IP address resources (IPaddr2) to start before HAProxy. Ordering constraints only apply to IP address resources and HAproxy. All the other resources are managed by systemd, because each service, such as Compute, is expected to be able to support an interruption of a dependent service, such as Galera.
Co-location Constraints
This section shows which resources need to be located together. All virtual IP addresses are linked to the haproxy-bundle resource.

7.1. Correcting Resource Problems on Controllers

Failed actions relating to the resources managed by the cluster are listed by the pcs status command. There are many different kinds of problems that can occur. In general, you can approach problems in the following ways:

Controller problem

If health checks to a controller are failing, log into the controller and check if services can start up without problems. Service startup problems could indicate a communication problem between controllers. Other indications of communication problems between controllers include:

  • A controller gets fenced disproportionately more than other controllers, and/or
  • A suspiciously large amount of services are failing from a specific controller.
Individual resource problem

If services from a controller are generally working, but an individual resource is failing, see if you can figure out the problem from the pcs status messages. If you need more information, log into the controller where the resource is failing and try some of the steps below.

Apart from IPs and core bundle resources (Galera, Rabbit and Redis) the only A/P resource managed by the cluster is openstack-cinder-volume. If this resource has an associated failed action, a good approach is to check the status from a systemctl perspective. So, once you have identified the node on which the resource is failing (for example overcloud-controller-0), it is possible to check the status of the resource:

[heat-admin@overcloud-controller-0 ~]$ sudo systemctl status openstack-cinder-volume
● openstack-cinder-volume.service - Cluster Controlled openstack-cinder-volume
   Loaded: loaded (/usr/lib/systemd/system/openstack-cinder-volume.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/openstack-cinder-volume.service.d
       	└─50-pacemaker.conf
   Active: active (running) since Tue 2016-11-22 09:25:53 UTC; 2 weeks 6 days ago
 Main PID: 383912 (cinder-volume)
   CGroup: /system.slice/openstack-cinder-volume.service
       	├─383912 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf --logfile /var/log/cinder/volume.log
       	└─383985 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf --logfile /var/log/cinder/volume.log


Nov 22 09:25:55 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:55.798 383912 WARNING oslo_config.cfg [req-8f32db96-7ca2-4fc5-82ab-271993b28174 - - - -...e future.
Nov 22 09:25:55 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:55.799 383912 WARNING oslo_config.cfg [req-8f32db96-7ca2-4fc5-82ab-271993b28174 - - - -...e future.
Nov 22 09:25:55 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:55.926 383985 INFO cinder.coordination [-] Coordination backend started successfully.
Nov 22 09:25:55 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:55.926 383985 INFO cinder.volume.manager [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - ...r (1.2.0)
Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.047 383985 WARNING oslo_config.cfg [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - - -...e future.
Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.048 383985 WARNING oslo_config.cfg [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - - -...e future.
Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.048 383985 WARNING oslo_config.cfg [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - - -...e future.
Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.063 383985 INFO cinder.volume.manager [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - ...essfully.
Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.111 383985 INFO cinder.volume.manager [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - ...r (1.2.0)
Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.146 383985 INFO cinder.volume.manager [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - ...essfully.
Hint: Some lines were ellipsized, use -l to show in full.

After you correct the failed resource, you can run the pcs resource cleanup command to reset the status and the fail count of the resource. Then, after finding and fixing a problem with the openstack-cinder-volume resource, run:

$ sudo pcs resource cleanup openstack-cinder-volume
  Resource: openstack-cinder-volume successfully cleaned up