Pacemaker クラスターで、フェンスされたノードが 2 回連続でリブートする
Issue
- あるノードがフェンシングによって一度正常にリブートしましたが、アクションがタイムアウトとしてマークされ、その後すぐにノードが再びフェンスされました。以下のログに示されているように、フェンスされたノードは、フェンスイベントが開始してからかなり時間が経つまで
corosyncメンバーシップから離脱しませんでした。
Mar 19 22:13:30 fastvm-rhel-8-0-23 pacemaker-schedulerd[338749]: warning: Unexpected result (error) was recorded for monitor of dummy1 on node2 at Mar 19 22:13:30 2021
Mar 19 22:13:30 fastvm-rhel-8-0-23 pacemaker-schedulerd[338749]: warning: Cluster node node2 will be fenced: dummy1 failed there
Mar 19 22:13:30 fastvm-rhel-8-0-23 pacemaker-schedulerd[338749]: warning: Scheduling Node node2 for STONITH
...
Mar 19 22:13:30 fastvm-rhel-8-0-23 pacemaker-fenced[338746]: notice: Requesting that node1 perform 'reboot' action targeting node2
...
Mar 19 22:13:32 fastvm-rhel-8-0-23 pacemaker-fenced[338746]: notice: Operation 'reboot' [338790] (call 3 from pacemaker-controld.338750) targeting node2 using xvm2 returned 0 (OK)
...
Mar 19 22:13:37 fastvm-rhel-8-0-23 corosync[1729]: [KNET ] link: host: 2 link: 0 is down
Mar 19 22:13:37 fastvm-rhel-8-0-23 corosync[1729]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Mar 19 22:13:37 fastvm-rhel-8-0-23 corosync[1729]: [KNET ] host: host: 2 has no active links
Mar 19 22:13:43 fastvm-rhel-8-0-23 corosync[1729]: [TOTEM ] Token has not been received in 12750 ms
Mar 19 22:13:47 fastvm-rhel-8-0-23 corosync[1729]: [TOTEM ] A processor failed, forming new configuration.
Mar 19 22:13:49 fastvm-rhel-8-0-23 pcsd[1375]: INFO:tornado.access:200 GET /remote/get_configs?cluster_name=testcluster (192.168.22.24) 36.51ms
Mar 19 22:14:08 fastvm-rhel-8-0-23 corosync[1729]: [TOTEM ] A new membership (1.11620) was formed. Members left: 2
Mar 19 22:14:08 fastvm-rhel-8-0-23 corosync[1729]: [TOTEM ] Failed to receive the leave message. failed: 2
Mar 19 22:14:08 fastvm-rhel-8-0-23 corosync[1729]: [CPG ] downlist left_list: 1 received
Mar 19 22:14:08 fastvm-rhel-8-0-23 corosync[1729]: [QUORUM] Members[1]: 1
Mar 19 22:14:08 fastvm-rhel-8-0-23 corosync[1729]: [MAIN ] Completed service synchronization, ready to provide service.
...
Mar 19 22:14:08 fastvm-rhel-8-0-23 pacemaker-fenced[338746]: error: Operation 'reboot' targeting node2 by node1 for pacemaker-controld.338750@node1: Timer expired
Mar 19 22:14:08 fastvm-rhel-8-0-23 pacemaker-fenced[338746]: error: Already sent notifications for 'reboot' targeting node2 by node1 for client pacemaker-controld.338750@node1: OK
Mar 19 22:14:08 fastvm-rhel-8-0-23 pacemaker-controld[338750]: notice: Stonith operation 3/5:5:0:c1e18fcd-50b5-44fa-bae0-49da438e92d7: Timer expired (-62)
Mar 19 22:14:08 fastvm-rhel-8-0-23 pacemaker-controld[338750]: notice: Stonith operation 3 for node2 failed (Timer expired): aborting transition.
Mar 19 22:14:08 fastvm-rhel-8-0-23 pacemaker-controld[338750]: notice: Transition 5 aborted: Stonith failed
Mar 19 22:14:08 fastvm-rhel-8-0-23 pacemaker-controld[338750]: notice: Peer node2 was not terminated (reboot) by node1 on behalf of pacemaker-controld.338750: Timer expired
...
Mar 19 22:14:09 fastvm-rhel-8-0-23 pacemaker-schedulerd[338749]: warning: Cluster node node2 will be fenced: peer is no longer part of the cluster
Environment
- * Red Hat Enterprise Linux 7 (High Availability Add-on 使用)
- * Red Hat Enterprise Linux 8 (High Availability Add-on 使用)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.