Chapter 8. Troubleshooting resource problems

In case of resource failure, you must investigate the cause and location of the problem, fix the failed resource, and optionally clean up the resource. There are many possible causes of resource failures depending on your deployment, and you must investigate the resource to determine how to fix the problem.

For example, you can check the resource constraints to ensure that the resources are not interrupting each other, and that the resources can connect to each other. You can also examine a Controller node that is fenced more often than other Controller nodes to identify possible communication problems.

8.1. Viewing resource constraints

You can view constraints on how services are launched, including constraints related to where each resource is located, the order in which the resource starts, and whether the resource must be colocated with another resource.

View all resource constraints

On any Controller node, run the pcs constraint show command.

$ sudo pcs constraint show

The following example shows a truncated output from the pcs constraint show command on a Controller node:

Location Constraints:
  Resource: galera-bundle
    Constraint: location-galera-bundle (resource-discovery=exclusive)
      Rule: score=0
        Expression: galera-role eq true
  [...]
  Resource: ip-192.168.24.15
    Constraint: location-ip-192.168.24.15 (resource-discovery=exclusive)
      Rule: score=0
        Expression: haproxy-role eq true
  [...]
  Resource: my-ipmilan-for-controller-0
    Disabled on: overcloud-controller-0 (score:-INFINITY)
  Resource: my-ipmilan-for-controller-1
    Disabled on: overcloud-controller-1 (score:-INFINITY)
  Resource: my-ipmilan-for-controller-2
    Disabled on: overcloud-controller-2 (score:-INFINITY)
Ordering Constraints:
  start ip-172.16.0.10 then start haproxy-bundle (kind:Optional)
  start ip-10.200.0.6 then start haproxy-bundle (kind:Optional)
  start ip-172.19.0.10 then start haproxy-bundle (kind:Optional)
  start ip-192.168.1.150 then start haproxy-bundle (kind:Optional)
  start ip-172.16.0.11 then start haproxy-bundle (kind:Optional)
  start ip-172.18.0.10 then start haproxy-bundle (kind:Optional)
Colocation Constraints:
  ip-172.16.0.10 with haproxy-bundle (score:INFINITY)
  ip-172.18.0.10 with haproxy-bundle (score:INFINITY)
  ip-10.200.0.6 with haproxy-bundle (score:INFINITY)
  ip-172.19.0.10 with haproxy-bundle (score:INFINITY)
  ip-172.16.0.11 with haproxy-bundle (score:INFINITY)
  ip-192.168.1.150 with haproxy-bundle (score:INFINITY)

This output displays the following main constraint types:

Location Constraints

Lists the locations to which resources can be assigned:

  • The first constraint defines a rule that sets the galera-bundle resource to run on nodes with the galera-role attribute set to true.
  • The second location constraint specifies that the IP resource ip-192.168.24.15 runs only on nodes with the haproxy-role attribute set to true. This means that the cluster associates the IP address with the haproxy service, which is necessary to make the services reachable.
  • The third location constraint shows that the ipmilan resource is disabled on each of the Controller nodes.
Ordering Constraints

Lists the order in which resources can launch. This example shows a constraint that sets the virtual IP address resources IPaddr2 to start before the HAProxy service.

Note

Ordering constraints only apply to IP address resources and to HAproxy. Systemd manages all other resources, because services such as Compute are expected to withstand an interruption of a dependent service, such as Galera.

Colocation Constraints
Lists which resources must be located together. All virtual IP addresses are linked to the haproxy-bundle resource.

View Galera location constraints

On any Controller node, run the pcs property show command.

$ sudo pcs property show

Example output:

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: tripleo_cluster
 dc-version: 2.0.1-4.el8-0eb7991564
 have-watchdog: false
 redis_REPL_INFO: overcloud-controller-0
 stonith-enabled: false
Node Attributes:
 overcloud-controller-0: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@overcloud-controller-0
 overcloud-controller-1: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@overcloud-controller-1
 overcloud-controller-2: cinder-volume-role=true galera-role=true haproxy-role=true rabbitmq-role=true redis-role=true rmq-node-attr-last-known-rabbitmq=rabbit@overcloud-controller-2

In this output, you can verify that the galera-role attribute is true for all Controller nodes. This means that the galera-bundle resource runs only on these nodes. The same concept applies to the other attributes associated with the other location constraints.

8.2. Investigating Controller node resource problems

Depending on the type and location of the problem, there are different approaches you can take to investigate and fix the resource.

Investigating Controller node problems
If health checks to a Controller node are failing, this can indicate a communication problem between Controller nodes. To investigate, log in to the Controller node and check if the services can start correctly.
Investigating individual resource problems
If most services on a Controller are running correctly, you can run the pcs status command and check the output for information about a specific service failure. You can also log in to the Controller where the resource is failing and investigate the resource behavior on the Controller node.

Procedure

The following procedure shows how to investigate the openstack-cinder-volume resource.

  1. Locate and log in to the Controller node on which the resource is failing.
  2. Run the systemctl status command to show the resource status and recent log events:

    [heat-admin@overcloud-controller-0 ~]$ sudo systemctl status openstack-cinder-volume
    ● openstack-cinder-volume.service - Cluster Controlled openstack-cinder-volume
       Loaded: loaded (/usr/lib/systemd/system/openstack-cinder-volume.service; disabled; vendor preset: disabled)
      Drop-In: /run/systemd/system/openstack-cinder-volume.service.d
           	└─50-pacemaker.conf
       Active: active (running) since Tue 2016-11-22 09:25:53 UTC; 2 weeks 6 days ago
     Main PID: 383912 (cinder-volume)
       CGroup: /system.slice/openstack-cinder-volume.service
           	├─383912 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf --logfile /var/log/cinder/volume.log
           	└─383985 /usr/bin/python3 /usr/bin/cinder-volume --config-file /usr/share/cinder/cinder-dist.conf --config-file /etc/cinder/cinder.conf --logfile /var/log/cinder/volume.log
    
    
    Nov 22 09:25:55 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:55.798 383912 WARNING oslo_config.cfg [req-8f32db96-7ca2-4fc5-82ab-271993b28174 - - - -...e future.
    Nov 22 09:25:55 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:55.799 383912 WARNING oslo_config.cfg [req-8f32db96-7ca2-4fc5-82ab-271993b28174 - - - -...e future.
    Nov 22 09:25:55 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:55.926 383985 INFO cinder.coordination [-] Coordination backend started successfully.
    Nov 22 09:25:55 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:55.926 383985 INFO cinder.volume.manager [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - ...r (1.2.0)
    Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.047 383985 WARNING oslo_config.cfg [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - - -...e future.
    Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.048 383985 WARNING oslo_config.cfg [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - - -...e future.
    Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.048 383985 WARNING oslo_config.cfg [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - - -...e future.
    Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.063 383985 INFO cinder.volume.manager [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - ...essfully.
    Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.111 383985 INFO cinder.volume.manager [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - ...r (1.2.0)
    Nov 22 09:25:56 overcloud-controller-0.localdomain cinder-volume[383912]: 2016-11-22 09:25:56.146 383985 INFO cinder.volume.manager [req-cb07b35c-af01-4c45-96f1-3d2bfc98ecb5 - - ...essfully.
    Hint: Some lines were ellipsized, use -l to show in full.
  3. Correct the failed resource based on the information from the output.
  4. Run the pcs resource cleanup command to reset the status and the fail count of the resource.

    $ sudo pcs resource cleanup openstack-cinder-volume
      Resource: openstack-cinder-volume successfully cleaned up