[OCS 4.8] cephfs kernel crash : mds_dispatch ceph_handle_snap unable to handle kernel NULL
Issue
Node which mount cephfs suffer from kernel crash.
You can see the following on the serial console or journal logs.
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: CPU: 2 PID: 2922581 Comm: kworker/2:2 Tainted: G W --------- - - 4.18.0-305.19.1.el8_4.x86_64 #1
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/24/2021
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: Workqueue: ceph-msgr ceph_con_workfn [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RIP: 0010:ihold+0x1b/0x20
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: Code: 00 c3 0f 0b c7 47 48 ff ff ff ff c3 0f 1f 00 0f 1f 44 00 00 b8 01 00 00 00 f0 0f c1 87 58 01 00 00 83 c0 01 83 f8 01 7e 01 c3 <0f> 0b c3 66 90 0f 1f 44 00 00 31 f2 31 c0 83 e2 30 75 01 c3 bf 09
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RSP: 0018:ffffbc1780a7fc78 EFLAGS: 00010246
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RAX: 0000000000000001 RBX: ffff93a8ae9b2800 RCX: 0000000000000000
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RDX: 0000000000000001 RSI: ffff93a8ae9b2900 RDI: ffff93a1cf541590
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RBP: ffff93a1cf541590 R08: 0000000000000000 R09: ffff937322862d80
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: R10: ffff936793441070 R11: 00000000ffffffe0 R12: ffff93a8ae9b2a18
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: R13: ffff93a1cf541218 R14: ffffbc1780a7fce0 R15: ffff93a8ae9b2a08
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: FS: 0000000000000000(0000) GS:ffff937a3f480000(0000) knlGS:0000000000000000
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: CR2: 000000c001615000 CR3: 000000202aa10004 CR4: 00000000007706e0
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: PKRU: 55555554
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: Call Trace:
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ceph_handle_snap+0x1ee/0x590 [ceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: mds_dispatch+0x176/0xbe0 [ceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ? calc_signature+0xdb/0x100 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ? ceph_x_check_message_signature+0x54/0xc0 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ceph_con_process_message+0x79/0x140 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ceph_con_v1_try_read+0x2ee/0x850 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ceph_con_workfn+0x333/0x690 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: process_one_work+0x1a7/0x360
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ? create_worker+0x1a0/0x1a0
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: worker_thread+0x30/0x390
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ? create_worker+0x1a0/0x1a0
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: kthread+0x116/0x130
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ? kthread_flush_work_fn+0x10/0x10
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ret_from_fork+0x1f/0x40
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ---[ end trace 3ba3eb96137ccf8e ]---
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ------------[ cut here ]------------
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: kernel BUG at fs/inode.c:1578!
Environment
- For openshift : OCS 4.8, ODF 4.9-4.12
- For RHEL : kernel below 4.18.0-497
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.