When using CephFS client on RHEL 9, kernel panic (RIP) in netfs

Solution Verified - Updated -

Red Hat Insights can detect this issue

Proactively detect and remediate issues impacting your systems.
View matching systems and remediation

Environment

Red Hat Enterprise Linux (RHEL) 9.0
Red Hat Enterprise Linux (RHEL) 9.1

Issue

We see a kernel crash within [netfs] with the following trace, where the PID triggering the crash seems to be one of these workers:

[433834.693062] BUG: kernel NULL pointer dereference, address: 0000000000000402
[433834.700131] #PF: supervisor read access in kernel mode
[433834.705366] #PF: error_code(0x0000) - not-present page
[433834.710602] PGD 96bbaf067 P4D 320580067 PUD 0
[433834.715149] Oops: 0000 [#1] PREEMPT SMP NOPTI
[433834.719605] CPU: 16 PID: 55696 Comm: DatasetCleanUpW Kdump: loaded Tainted: G        W  OE    --------- ---  5.14.0-70.22.1.el9_0.x86_64 #1
[433834.732225] Hardware name: Dell Inc. PowerEdge R650xs/019H6N, BIOS 1.6.5 04/15/2022
[433834.739978] RIP: 0010:netfs_rreq_unlock+0xef/0x380 [netfs]                           <--- Here
[433834.745573] Code: 44 00 00 e8 f3 a7 f4 e5 4c 89 fe 45 31 f6 48 8d 7c 24 10 e8 d3 d4 32 e6 48 89 c7 48 85 c0 0f 84 23 01 00 00 31 ed 49 8d 55 48 <48> 8b 0f 48 8b 47 20 48 2b 04 24 48 c1 e9 10 c1 e0 0c 83 e1 01 80
[433834.764453] RSP: 0018:ff213397676ffa90 EFLAGS: 00010246
[433834.769778] RAX: 0000000000000402 RBX: ff154881029c27e0 RCX: 0000000000000002
[433834.777009] RDX: ff154880c85ce1c8 RSI: ff154881ecc0d8f8 RDI: 0000000000000402
[433834.784246] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[433834.791484] R10: 00000000000001c1 R11: 000000000000014a R12: 0000000000000000
[433834.798721] R13: ff154880c85ce180 R14: 0000000000000000 R15: 0000000000000000
[433834.805958] FS:  00007f0344bbe640(0000) GS:ff15489fbf800000(0000) knlGS:0000000000000000
[433834.814148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[433834.819994] CR2: 0000000000000402 CR3: 00000001c6730006 CR4: 0000000000771ee0
[433834.827224] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[433834.834463] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[433834.841700] PKRU: 55555554
[433834.844505] Call Trace:
[433834.847050]  netfs_rreq_assess+0xa6/0x240 [netfs]                                    <--- Here
[433834.851851]  netfs_readpage+0x173/0x3b0 [netfs]
[433834.856479]  ? init_wait_var_entry+0x50/0x50
[433834.860855]  filemap_read_page+0x33/0xf0
[433834.864885]  filemap_get_pages+0x2f2/0x3f0
[433834.869080]  filemap_read+0xaa/0x320
[433834.872754]  ? do_filp_open+0xb2/0x150
[433834.876599]  ? rmqueue+0x3be/0xe10
[433834.880102]  ceph_read_iter+0x1fe/0x680 [ceph]
[433834.884656]  ? new_sync_read+0x115/0x1a0
[433834.888673]  new_sync_read+0x115/0x1a0
[433834.892521]  vfs_read+0xf3/0x180
[433834.895850]  ksys_read+0x5f/0xe0
[433834.899178]  do_syscall_64+0x38/0x90
[433834.902851]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[433834.908004] RIP: 0033:0x7f053a45b0bc
[433834.911684] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 79 cb f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 bf cb f5 ff 48
[433834.931087] RSP: 002b:00007f0344bbb350 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[433834.939284] RAX: ffffffffffffffda RBX: 00007f0344bbd438 RCX: 00007f053a45b0bc
[433834.947044] RDX: 0000000000001f40 RSI: 00007f0344bbb3e0 RDI: 000000000000000c
[433834.954809] RBP: 00007f0344bbb3a0 R08: 0000000000000000 R09: 0000000584c91aa0
[433834.962570] R10: 00000000000006ec R11: 0000000000000246 R12: 0000000000001f40
[433834.970321] R13: 00007f0344bbb3e0 R14: 000000000000000c R15: 00007f0310001a58
[433834.978066] Modules linked in: binfmt_misc mpt3sas raid_class scsi_transport_sas ceph libceph dns_resolver fscache netfs eset_rtp(OE) ip_tables bonding tls xt_state xt_conntrack nft_counter xt_LOG nf_log_syslog rfkill nft_compat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables dell_rbu nfnetlink vfat fat ipmi_ssif intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel dcdbas kvm irqbypass rapl intel_cstate dell_smbios dell_wmi_descriptor wmi_bmof pcspkr intel_uncore joydev mei_me acpi_ipmi isst_if_mbox_pci isst_if_mmio i2c_i801 ioatdma isst_if_common i2c_smbus mei intel_pch_thermal intel_pmt dca ipmi_si acpi_power_meter xfs libcrc32c dm_crypt sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul cec crc32_pclmul crc32c_intel ahci libahci drm i40e ghash_clmulni_intel libata megaraid_sas tg3 uas usb_storage i2c_algo_bit wmi
[433834.978118]  dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
[433835.077885] CR2: 0000000000000402

Resolution

  • The issue is fixed in RHEL 9.2.
  • A kernel patch is now (March-2023) available in RHEL 9.1.
  • Follow the Errata link below to upgrade your system to RHEL 9.1 + this kernel fix.
  • RHEL 9.0 systems will need to be upgraded to RHEL 9.1 + this kernel fix.

Root Cause

Defect in the netfs module of RHEL 9.0 and 9.1

Artifacts:
RHEL 9.1 Bugzilla #2161418
RHEL 9.2 Bugzilla #2138981
RHEL 9.1 Errata: RHSA-2023:0951 - Security Advisory
RHEL 9.2 Errata: RHSA-2023:2458 - Security Advisory

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments