PCSD Status detection is slow

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Or Resilient Storage Add On
  • pcs package version pcs-0.9.143* or older

Issue

  • Command pcs status does not return PCSD Status
  • Command pcs status is very slow
  • Command pcs status takes too long to complete

Resolution

Update to the following version of pcs. With this version, the PCSD status output is only displayed when the -full option is enabled. In addition, there was optimizations made to pcs such as parallelize pcsd status check. If the --full option is not provided then the command pcs status will not check the pcsd status on each cluster node.

# pcs status --full

Red Hat Enterprise Linux 7

  • The issue (bz1207405) has been resolved with errata RHSA-2016:2596 with the following package(s): pcs-0.9.152-10.el7 or later

The pcs status command on pcs version pcs-0.9.152-10.el7 or higher will report the following:

# pcs status
Cluster name: cluster
Stack: corosync
Current DC: node1 partition with quorum
Last updated: Tue May 21 11:08:08 2019

Online: [ node1.example.com node2.example.com ]

Full list of resources:

 fence01    (stonith:fence_xvm):    Started node1.example.com
 fence02    (stonith:fence_xvm):    Started node2.example.com

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled

It's also possible to check the PCSD Status including the --full option:

# pcs status --full
Cluster name: cluster
Stack: corosync
Current DC: node1 partition with quorum
Last updated: Tue May 21 11:08:08 2019

Online: [ node1.example.com node2.example.com ]

Full list of resources:

 fence01    (stonith:fence_xvm):    Started node1.example.com
 fence02    (stonith:fence_xvm):    Started node2.example.com

PCSD Status:
  node1.example.com: Online
  node2.example.com: Online

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled

Root Cause

This issue can happen due the following causes:

  • Immediately after a node goes down, the next pcs status will take very long to complete
  • During a node rejoin to the cluster
  • Network communication failure between the nodes

The issue happen because the command does not have a timeout parameter, in order to break the check in older pcs package versions. It will wait until the other cluster nodes reply with pcsd service status.

In addition, the pcs status command required a few optimizations that made it perform faster when getting the PCSD status output.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments