fence_scsi_check.pl watchdog script does a soft reboot instead of hard and hangs during shutdown in a RHEL 6 or 7 Resilient Storage cluster with device-mapper-multipath
Issue
- After manually fencing a node with actively running a resource group, scsi watchdog begins to initiate a reboot but fails to completely reboot the machine.
- When watchdog reboots a node, it gets stuck shutting down. I see backtraces with it waiting on device mapper or the file system
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add On
- Using SCSI Persistent Reservation Fencing (
fence_scsi
) - Using the
fence_scsi_check.pl
watchdog script forfence_scsi
to reboot a node when fenced- RHEL 7:
- Using a
fence-agents-scsi
release prior to4.0.11-27.el7_2.5
, OR - Using
fence-agents-scsi-4.0.11-27.el7_2.5
or later AND/etc/watchdog.d/fence_scsi_check
is in place (as opposed to/etc/watchdog.d/fence_scsi_check_hardreboot
)
- Using a
- RHEL 6:
- Using a
fence-agents
release prior to3.1.5-48.el6
, OR - Using
fence-agents-3.1.5-48.el6
or later AND/usr/share/cluster/fence_scsi_check.pl
is linked or copied to/etc/watchdog.d
(as opposed to/usr/share/cluster/fence_scsi_check_hardreboot.pl
being linked or copied)
- Using a
- RHEL 7:
device-mapper-multipath
- The settings for the device in question enable queueing (even if only temporary) when all paths have failed
- Can be enabled via
no_path_retry
set to "queue" or a value greater than 0 in/etc/multipath.conf
, or in the built-in device settings inmultipathd
(see/usr/share/doc/device-mapper-multipath-$vers/multipath.conf.defaults
) - Can be enabled via
features "1 queue_if_no_path"
in/etc/multipath.conf
or built-in device settings inmultipathd
ifno_path_retry
is not set.
- Can be enabled via
- The settings for the device in question enable queueing (even if only temporary) when all paths have failed
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.