Panic in locks_remove_flock after GFS filesystem withdraw on Red Hat Enterprise Linux 4
Issue
When running sosreport (not 100% reproducable but happened on multiple clusters) on a two node cluster with the latest 4.7 EUS kernel, the GFS filesystem will be withdrawn:
GFS: fsid=cluster1:data1.1: fatal: invalid metadata block
GFS: fsid=cluster1:data1.1: bh = 111902488 (magic)
GFS: fsid=cluster1:data1.1: function = gfs_get_meta_buffer
GFS: fsid=cluster1:data1.1: file = /builddir/build/BUILD/gfs-kernel-2.6.9-80/largesmp/src/gfs/dio.c, line = 1110
GFS: fsid=cluster1:data1.1: time = 1267740839
GFS: fsid=cluster1:data1.1: about to withdraw from the cluster
GFS: fsid=cluster1:data1.1: waiting for outstanding I/O
GFS: fsid=cluster1:data1.1: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=cluster1:data1.1: withdrawn
This then leads to a panic due to outstanding locks:
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at locks:1823
invalid operand: 0000 [1] SMP
CPU 6
Modules linked in: ip_vs hp_ilo(U) mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler nfsd exportfs netconsole i2c_dev i2c_core nfs lockd
nfs_acl sunrpc qioctlmod ib_srp ib_sdp ib_ipoib md5 ipv6 rdma_ucm rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core
ide_dump cciss_dump scsi_dump diskdump zlib_deflate joydev button battery ac ohci_hcd ehci_hcd uhci_hcd ext3 jbd dm_round_robin dm_emc dm_multipath
lock_dlm(U) dlm(U) gfs(U) lock_harness(U) cman(U) qla2400 usb_storage qla2xxx scsi_transport_fc cciss sg sd_mod scsi_mod dm_snapshot dm_mirror
dm_mod bonding(U) e1000
Pid: 19634, comm: oracle Not tainted 2.6.9-78.0.28.ELlargesmp
RIP: 0010:[<ffffffff80190eb7>] <ffffffff80190eb7>{locks_remove_flock+201}
RSP: 0018:000001085ba63c58 EFLAGS: 00010246
RAX: 000001045c39b9c0 RBX: 000001086634b5b8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000053a RDI: ffffffff80526880
RBP: 000001086634b4a8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 000001045bab1680
R13: 000001086634b4a8 R14: 000001086b017cb8 R15: 000001044f084ec8
FS: 0000002a96f604e0(0000) GS:ffffffff80520700(0000) knlGS:00000000f57b6ba0
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9556c000 CR3: 0000000470302000 CR4: 00000000000006e0
Process oracle (pid: 19634, threadinfo 000001085ba62000, task 000001044f0847f0)
Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 0000000000000000
Call Trace:<ffffffff8017c28e>{__fput+74} <ffffffff8017ae89>{filp_close+103}
<ffffffff8013a125>{put_files_struct+101} <ffffffff8013a99c>{do_exit+711}
<ffffffff80141c22>{recalc_sigpending+15} <ffffffff801421ff>{__dequeue_signal+458}
<ffffffff8013b4ce>{sys_exit_group+0} <ffffffff80144097>{get_signal_to_deliver+1084}
<ffffffff8010f75f>{do_signal+131} <ffffffff801409db>{del_timer+107}
<ffffffff80140a98>{del_singleshot_timer_sync+9} <ffffffff803186de>{schedule_timeout+411}
<ffffffff8011037f>{sysret_signal+28} <ffffffff8011066f>{ptregscall_common+103}
Code: 0f 0b 0c 5c 33 80 ff ff ff ff 1f 07 48 89 c3 48 8b 03 eb ba
RIP <ffffffff80190eb7>{locks_remove_flock+201} RSP <000001085ba63c58>
Environment
- Red Hat Enterprise Linux 4 prior to 4.9
-
Red Hat Cluster Suite
-
At least 2 nodes in the cluster
-
GFS-kernel package prior to GFS-kernel-2.6.9-87.5.el4
-
-
GFS filesystem mounted on both nodes with lockproto=lock_dlm (clustered locking)
-
The issue can be triggered by running sosreport if it causes a withdraw, but it is not the only cause.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.