Translated message

A translation of this page exists in English.

High Availability クラスターでノードがパニック状態になると、fence_aws のフェンスアクションが "Timed out waiting to power OFF" および "Unable to obtain correct plug status or plug is not available" で失敗する

Solution In Progress - Updated -

Issue

  • AWS 上の Pacemaker クラスターノードでカーネルパニックが発生したとき、または echo c > /proc/sysrq-trigger を実行してクラッシュさせたとき、そのノードに対するフェンスアクションが "Timed out waiting to power OFF" で失敗します。アクションを再試行すると、"Unable to obtain correct plug status or plug is not available" で繰り返し失敗します。
  • pcs stonith fence コマンドを使用してノードを手動でフェンスすることは成功します。
  • この問題は断続的に発生することがあります。
Apr  2 22:18:04 ip-10-0-0-17 corosync[14259]: [TOTEM ] A processor failed, forming new configuration.
Apr  2 22:18:05 ip-10-0-0-17 corosync[14259]: [TOTEM ] A new membership (10.0.0.17:177) was formed. Members left: 2
Apr  2 22:18:05 ip-10-0-0-17 corosync[14259]: [TOTEM ] Failed to receive the leave message. failed: 2
Apr  2 22:18:05 ip-10-0-0-17 corosync[14259]: [CPG   ] downlist left_list: 1 received
Apr  2 22:18:05 ip-10-0-0-17 corosync[14259]: [QUORUM] Members[1]: 1
Apr  2 22:18:05 ip-10-0-0-17 corosync[14259]: [MAIN  ] Completed service synchronization, ready to provide service.
Apr  2 22:18:05 ip-10-0-0-17 pacemakerd[14283]:  notice: Node node2 state is now lost
...
Apr  2 22:18:06 ip-10-0-0-17 crmd[14289]:  notice: Requesting fencing (reboot) of node node2
...
Apr  2 22:19:11 ip-10-0-0-17 fence_aws: Failed: Timed out waiting to power OFF
...
Apr  2 22:19:11 ip-10-0-0-17 stonith-ng[14285]:   error: Operation 'reboot' [18866] (call 42 from crmd.14289) for host 'node2' with device 'aws_fence' returned: -62 (Timer expired)
...
Apr  2 22:19:11 ip-10-0-0-17 crmd[14289]:  notice: Peer node2 was not terminated (reboot) by node1 on behalf of crmd.14289: Timer expired
...
Apr  2 22:19:11 ip-10-0-0-17 pengine[14288]: warning: Cluster node node2 will be fenced: peer is no longer part of the cluster
...
Apr  2 22:19:11 ip-10-0-0-17 crmd[14289]:  notice: Requesting fencing (reboot) of node node2
...
Apr  2 22:19:12 ip-10-0-0-17 fence_aws: Failed: Unable to obtain correct plug status or plug is not available
...
Apr  2 22:19:14 ip-10-0-0-17 fence_aws: Failed: Unable to obtain correct plug status or plug is not available
...
Apr  2 22:19:14 ip-10-0-0-17 stonith-ng[14285]:   error: Operation 'reboot' [19126] (call 43 from crmd.14289) for host 'node2' with device 'aws_fence' returned: -201 (Generic Pacemaker error)
Apr  2 22:19:14 ip-10-0-0-17 stonith-ng[14285]:  notice: Couldn't find anyone to fence (reboot) node2 with any device
Apr  2 22:19:14 ip-10-0-0-17 stonith-ng[14285]:   error: Operation reboot of node2 by <no-one> for crmd.14289@node1.5abdec11: No route to host

Environment

  • Red Hat Enterprise Linux 7、8、9、10 (High Availability アドオン使用)
  • クラスターノードとして Amazon Web Services (AWS) EC2 インスタンスを使用

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content