File system corrupted on cluster-managed fs resource after it failed to stop and then nodes' cluster daemons were restarted without a reboot in RHEL 6

Solution In Progress - Updated -

Issue

  • My cluster service failed to stop an fs resource because it apparently was still in use, and following this I restarted rgmanager on the nodes, thinking this would clear it up. Afterwards, my file system was corrupted.
Mar 17 23:13:31 rgmanager Stopping service service:myService
Mar 17 23:13:46 rgmanager [fs] unmounting /home/users
Mar 17 23:13:46 rgmanager [fs] umount failed: 1
Mar 17 23:13:46 rgmanager [fs] Sending SIGTERM to processes on /home/users
Mar 17 23:13:51 rgmanager [fs] unmounting /home/users
Mar 17 23:13:51 rgmanager [fs] umount failed: 1
Mar 17 23:13:52 rgmanager [fs] Sending SIGKILL to processes on /home/users
Mar 17 23:13:57 rgmanager [fs] unmounting /home/users
Mar 17 23:13:57 rgmanager [fs] umount failed: 1
Mar 17 23:13:57 rgmanager [fs] Sending SIGKILL to processes on /home/users
Mar 17 23:13:57 rgmanager [fs] 'umount /home/users' failed, error=1
Mar 17 23:13:57 rgmanager stop on fs "fs-users" returned 1 (generic error)
Mar 17 23:13:58 rgmanager [lvm] Logical volume myVG/myLV failed to shutdown
Mar 17 23:13:58 rgmanager stop on lvm "myVG-myLV" returned 1 (generic error)
Mar 17 23:13:58 rgmanager #12: RG service:myService failed to stop; intervention required
Mar 17 23:13:58 rgmanager Service service:myService is failed
[...]
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 204277
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 204280
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 204279
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 204283
Mar 18 04:16:41 node1 kernel: EXT4-fs error (device dm-8): ext4_mb_free_metadata: Double free of blocks 4327 (4327 1)
Mar 18 04:16:41 node1 kernel: JBD: Spotted dirty metadata buffer (dev = dm-8, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Mar 18 04:16:45 node1 kernel: EXT4-fs error (device dm-8): mb_free_blocks: double-free of inode 0's block 563840(bit 6784 in group 17)
Mar 18 04:16:45 node1 kernel: EXT4-fs error (device dm-8): mb_free_blocks: double-free of inode 0's block 563841(bit 6785 in group 17)
Mar 18 04:16:45 node1 kernel: EXT4-fs error (device dm-8): mb_free_blocks: double-free of inode 0's block 563858(bit 6802 in group 17)
  • Two nodes in my cluster had the file system mounted after it failed to stop, and then the nodes were restarted without rebooting

Environment

  • Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
  • One or more <fs> resources in /etc/cluster/cluster.conf without self_fence="1" enabled
  • rgmanager
    • A situation in which the rgmanager service/daemon may be stopped and then started on all nodes after an <fs> resource has failed to stop

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content