[OCS 4.8] cephfs kernel crash : mds_dispatch ceph_handle_snap unable to handle kernel NULL

Solution In Progress - Updated -

Issue

Node which mount cephfs suffer from kernel crash.
You can see the following on the serial console or journal logs.

May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: CPU: 2 PID: 2922581 Comm: kworker/2:2 Tainted: G        W        --------- -  - 4.18.0-305.19.1.el8_4.x86_64 #1
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 11/24/2021
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: Workqueue: ceph-msgr ceph_con_workfn [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RIP: 0010:ihold+0x1b/0x20
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: Code: 00 c3 0f 0b c7 47 48 ff ff ff ff c3 0f 1f 00 0f 1f 44 00 00 b8 01 00 00 00 f0 0f c1 87 58 01 00 00 83 c0 01 83 f8 01 7e 01 c3 <0f> 0b c3 66 90 0f 1f 44 00 00 31 f2 31 c0 83 e2 30 75 01 c3 bf 09
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RSP: 0018:ffffbc1780a7fc78 EFLAGS: 00010246
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RAX: 0000000000000001 RBX: ffff93a8ae9b2800 RCX: 0000000000000000
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RDX: 0000000000000001 RSI: ffff93a8ae9b2900 RDI: ffff93a1cf541590
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: RBP: ffff93a1cf541590 R08: 0000000000000000 R09: ffff937322862d80
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: R10: ffff936793441070 R11: 00000000ffffffe0 R12: ffff93a8ae9b2a18
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: R13: ffff93a1cf541218 R14: ffffbc1780a7fce0 R15: ffff93a8ae9b2a08
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: FS:  0000000000000000(0000) GS:ffff937a3f480000(0000) knlGS:0000000000000000
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: CR2: 000000c001615000 CR3: 000000202aa10004 CR4: 00000000007706e0
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: PKRU: 55555554
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: Call Trace:
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ceph_handle_snap+0x1ee/0x590 [ceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  mds_dispatch+0x176/0xbe0 [ceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ? calc_signature+0xdb/0x100 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ? ceph_x_check_message_signature+0x54/0xc0 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ceph_con_process_message+0x79/0x140 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ceph_con_v1_try_read+0x2ee/0x850 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ceph_con_workfn+0x333/0x690 [libceph]
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  process_one_work+0x1a7/0x360
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ? create_worker+0x1a0/0x1a0
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  worker_thread+0x30/0x390
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ? create_worker+0x1a0/0x1a0
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  kthread+0x116/0x130
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ? kthread_flush_work_fn+0x10/0x10
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel:  ret_from_fork+0x1f/0x40
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ---[ end trace 3ba3eb96137ccf8e ]---
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: ------------[ cut here ]------------
May 20 00:11:06 m2.xxx.aaa.bbb.ccc.ddd.eee.fff kernel: kernel BUG at fs/inode.c:1578!

Environment

  • For openshift : OCS 4.8, ODF 4.9-4.12
  • For RHEL : kernel below 4.18.0-497

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content