Panic in locks_remove_flock after GFS filesystem withdraw on Red Hat Enterprise Linux 4

Solution Unverified - Updated -

Issue

When running sosreport (not 100% reproducable but happened on multiple clusters) on a two node cluster with the latest 4.7 EUS kernel, the GFS filesystem will be withdrawn:

GFS: fsid=cluster1:data1.1: fatal: invalid metadata block
GFS: fsid=cluster1:data1.1:   bh = 111902488 (magic)
GFS: fsid=cluster1:data1.1:   function = gfs_get_meta_buffer
GFS: fsid=cluster1:data1.1:   file = /builddir/build/BUILD/gfs-kernel-2.6.9-80/largesmp/src/gfs/dio.c, line = 1110
GFS: fsid=cluster1:data1.1:   time = 1267740839
GFS: fsid=cluster1:data1.1: about to withdraw from the cluster
GFS: fsid=cluster1:data1.1: waiting for outstanding I/O
GFS: fsid=cluster1:data1.1: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=cluster1:data1.1: withdrawn

This then leads to a panic due to outstanding locks:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at locks:1823
invalid operand: 0000 [1] SMP 
CPU 6 
Modules linked in: ip_vs hp_ilo(U) mptctl mptbase ipmi_devintf ipmi_si ipmi_msghandler nfsd exportfs netconsole i2c_dev i2c_core nfs lockd
nfs_acl sunrpc qioctlmod ib_srp ib_sdp ib_ipoib md5 ipv6 rdma_ucm rdma_cm iw_cm ib_addr ib_umad ib_ucm ib_uverbs ib_cm ib_sa ib_mad ib_core
ide_dump cciss_dump scsi_dump diskdump zlib_deflate joydev button battery ac ohci_hcd ehci_hcd uhci_hcd ext3 jbd dm_round_robin dm_emc dm_multipath
lock_dlm(U) dlm(U) gfs(U) lock_harness(U) cman(U) qla2400 usb_storage qla2xxx scsi_transport_fc cciss sg sd_mod scsi_mod dm_snapshot dm_mirror
dm_mod bonding(U) e1000
Pid: 19634, comm: oracle Not tainted 2.6.9-78.0.28.ELlargesmp
RIP: 0010:[<ffffffff80190eb7>] <ffffffff80190eb7>{locks_remove_flock+201}
RSP: 0018:000001085ba63c58  EFLAGS: 00010246
RAX: 000001045c39b9c0 RBX: 000001086634b5b8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000053a RDI: ffffffff80526880
RBP: 000001086634b4a8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 000001045bab1680
R13: 000001086634b4a8 R14: 000001086b017cb8 R15: 000001044f084ec8
FS:  0000002a96f604e0(0000) GS:ffffffff80520700(0000) knlGS:00000000f57b6ba0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a9556c000 CR3: 0000000470302000 CR4: 00000000000006e0
Process oracle (pid: 19634, threadinfo 000001085ba62000, task 000001044f0847f0)
Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
       0000000000000000 0000000000000000 0000000000000000 0000000000000000 
       0000000000000000 0000000000000000 
Call Trace:<ffffffff8017c28e>{__fput+74} <ffffffff8017ae89>{filp_close+103} 
           <ffffffff8013a125>{put_files_struct+101} <ffffffff8013a99c>{do_exit+711} 
           <ffffffff80141c22>{recalc_sigpending+15} <ffffffff801421ff>{__dequeue_signal+458} 
           <ffffffff8013b4ce>{sys_exit_group+0} <ffffffff80144097>{get_signal_to_deliver+1084} 
           <ffffffff8010f75f>{do_signal+131} <ffffffff801409db>{del_timer+107} 
           <ffffffff80140a98>{del_singleshot_timer_sync+9} <ffffffff803186de>{schedule_timeout+411} 
           <ffffffff8011037f>{sysret_signal+28} <ffffffff8011066f>{ptregscall_common+103} 

Code: 0f 0b 0c 5c 33 80 ff ff ff ff 1f 07 48 89 c3 48 8b 03 eb ba 
RIP <ffffffff80190eb7>{locks_remove_flock+201} RSP <000001085ba63c58>

Environment

  • Red Hat Enterprise Linux 4 prior to 4.9
  • Red Hat Cluster Suite

    • At least 2 nodes in the cluster

    • GFS-kernel package prior to GFS-kernel-2.6.9-87.5.el4

  • GFS filesystem mounted on both nodes with lockproto=lock_dlm (clustered locking)

  • The issue can be triggered by running sosreport if it causes a withdraw, but it is not the only cause.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content