fence_scsi_check.pl watchdog script does a soft reboot instead of hard and hangs during shutdown in a RHEL 6 or 7 Resilient Storage cluster with GFS2
Issue
- scsi fencing watchdog does not hard reset a node
- The reboot that watchdog does is "soft" so things can hang during the shutdown
- When a node is fenced by
fence_scsi, it seems to get stuck on the way down and never reboots. There are hung task warnings on the console showing processes blocked waiting on GFS2.
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the Resilient Storage Add On
- GFS2
- Using SCSI Persistent Reservation Fencing (
fence_scsi) - Using the
fence_scsi_check.plwatchdog script forfence_scsito reboot a node when fenced- RHEL 7:
- Using a
fence-agents-scsirelease prior to4.0.11-27.el7_2.5, OR - Using
fence-agents-scsi-4.0.11-27.el7_2.5or later AND/etc/watchdog.d/fence_scsi_checkis in place (as opposed to/etc/watchdog.d/fence_scsi_check_hardreboot)
- Using a
- RHEL 6:
- Using a
fence-agentsrelease prior to3.1.5-48.el6, OR - Using
fence-agents-3.1.5-48.el6or later AND/usr/share/cluster/fence_scsi_check.plis linked or copied to/etc/watchdog.d(as opposed to/usr/share/cluster/fence_scsi_check_hardreboot.plbeing linked or copied)
- Using a
- RHEL 7:
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.