File system corrupted on cluster-managed fs resource after it failed to stop and then nodes' cluster daemons were restarted without a reboot in RHEL 6
Issue
- My cluster service failed to stop an
fs
resource because it apparently was still in use, and following this I restartedrgmanager
on the nodes, thinking this would clear it up. Afterwards, my file system was corrupted.
Mar 17 23:13:31 rgmanager Stopping service service:myService
Mar 17 23:13:46 rgmanager [fs] unmounting /home/users
Mar 17 23:13:46 rgmanager [fs] umount failed: 1
Mar 17 23:13:46 rgmanager [fs] Sending SIGTERM to processes on /home/users
Mar 17 23:13:51 rgmanager [fs] unmounting /home/users
Mar 17 23:13:51 rgmanager [fs] umount failed: 1
Mar 17 23:13:52 rgmanager [fs] Sending SIGKILL to processes on /home/users
Mar 17 23:13:57 rgmanager [fs] unmounting /home/users
Mar 17 23:13:57 rgmanager [fs] umount failed: 1
Mar 17 23:13:57 rgmanager [fs] Sending SIGKILL to processes on /home/users
Mar 17 23:13:57 rgmanager [fs] 'umount /home/users' failed, error=1
Mar 17 23:13:57 rgmanager stop on fs "fs-users" returned 1 (generic error)
Mar 17 23:13:58 rgmanager [lvm] Logical volume myVG/myLV failed to shutdown
Mar 17 23:13:58 rgmanager stop on lvm "myVG-myLV" returned 1 (generic error)
Mar 17 23:13:58 rgmanager #12: RG service:myService failed to stop; intervention required
Mar 17 23:13:58 rgmanager Service service:myService is failed
[...]
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 204277
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 204280
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 204279
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 204283
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_mb_free_metadata: Double free of blocks 4327 (4327 1)
Mar 18 04:16:41 node1 kernel: JBD: Spotted dirty metadata buffer (dev = dm-8, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Mar 18 04:16:45 node1 kernel: EXT4-fs error (device dm-8): mb_free_blocks: double-free of inode 0's block 563840(bit 6784 in group 17)
Mar 18 04:16:45 node1 kernel: EXT4-fs error (device dm-8): mb_free_blocks: double-free of inode 0's block 563841(bit 6785 in group 17)
Mar 18 04:16:45 node1 kernel: EXT4-fs error (device dm-8): mb_free_blocks: double-free of inode 0's block 563858(bit 6802 in group 17)
- Two nodes in my cluster had the file system mounted after it failed to stop, and then the nodes were restarted without rebooting
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
- One or more
<fs>
resources in/etc/cluster/cluster.conf
withoutself_fence="1"
enabled rgmanager
- A situation in which the
rgmanager
service/daemon may be stopped and then started on all nodes after an<fs>
resource has failed to stop
- A situation in which the
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.