Kernel crashed due to use-after-free in libceph in RHEL

Solution Verified - Updated -

Issue

  • Red Hat Ceph Storage Client kernel crashed with "general protection fault" due to invalid SLUB allocator freelist pointer with any of the following messages:

        [42571.910268] WARNING: CPU: 15 PID: 12481 at include/linux/kref.h:52 ceph_msg_get+0x73/0x80 [libceph]
    
        [8197436.527360] WARNING: CPU: 3 PID: 15288 at net/ceph/osd_client.c:493 request_reinit+0x16c/0x180 [libceph]
    
  • Note: The Bug is a use-after-free error reported as general protection fault

    • the crash message might differ across kernel version
    • the "general protection fault" might not occur
    • libceph WARNING stack traces will always show up in the logs
  • Kernel crashed with logs:

    [10421938.967419] general protection fault: 0000 [#1] SMP NOPTI
    [10421938.967454] CPU: 0 PID: 3617727 Comm: kworker/0:3 Kdump: loaded Tainted: G        W I      --------- -  - 4.18.0-372.49.1.el8_6.x86_64 #1
    [10421938.967505] Hardware name:  /0WRPXK, BIOS 2.17.1 11/15/2022
    [10421938.967530] Workqueue: events handle_timeout [libceph]
    [10421938.967587] RIP: 0010:down_read_trylock+0x18/0x50
    [10421938.967614] Code: e9 8d d5 aa 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 48 b9 07 00 00 00 00 00 00 80 31 c0 48 8d 90 00 01 00 00 <f0> 48 0f b1 17 75 25 48 8b 57 20 65 48 8b 04 25 40 5c 01 00 83 e2
    [10421938.967684] RSP: 0018:ffffb919c085fdc0 EFLAGS: 00010246
    [10421938.967708] RAX: 0000000000000000 RBX: ffff8f5d19c59888 RCX: 8000000000000007
    [10421938.967739] RDX: 0000000000000100 RSI: ffff8f1426d6c7d8 RDI: 6b6b6b6b6b6b6b7b
    [10421938.967769] RBP: 6b6b6b6b6b6b6b6b R08: ffffb919c085fe50 R09: 000000036d2d8ddb
    [10421938.967797] R10: 0000000000000000 R11: ffffffff98c5b5c8 R12: 6b6b6b6b6b6b6b7b
    [10421938.968309] R13: ffff8f0b78922a80 R14: ffff8f5eb5159000 R15: ffff8f5d19c59888
    [10421938.968770] FS:  0000000000000000(0000) GS:ffff8f493f400000(0000) knlGS:0000000000000000
    [10421938.969255] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [10421938.969700] CR2: 00005649c1512898 CR3: 0000000197210004 CR4: 00000000007706b0
    [10421938.970158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [10421938.970604] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [10421938.971042] PKRU: 55555554
    [10421938.971464] Call Trace:
    [10421938.971888]  cancel_map_check+0x1f/0xa0 [libceph]
    [10421938.972360]  cancel_request+0x16/0x60 [libceph]
    [10421938.972830]  cancel_linger_request+0x1b/0x50 [libceph]
    [10421938.973309]  handle_timeout+0x3a6/0x5a0 [libceph]
    [10421938.973799]  process_one_work+0x1a7/0x360
    [10421938.974271]  ? create_worker+0x1a0/0x1a0
    [10421938.974756]  worker_thread+0x30/0x390
    [10421938.975243]  ? create_worker+0x1a0/0x1a0
    [10421938.975737]  kthread+0x10a/0x120
    [10421938.976237]  ? set_kthread_struct+0x50/0x50
    [10421938.976745]  ret_from_fork+0x1f/0x40
    [10421938.977259] Modules linked in: xt_DSCP iptable_mangle ceph xt_owner iavf vfio_pci vfio_virqfd vfio_iommu_type1 vfio xt_sctp iptable_filter macvlan loop veth nbd rbd libceph dns_resolver xt_REDIRECT xt_addrtype vhost_net tun vhost vhost_iotlb tap ipt_REJECT nf_reject_ipv4 xt_nat xt_CT nf_conntrack_netlink ip6t_MASQUERADE ipt_MASQUERADE xt_mark xt_conntrack xt_comment nft_compat nft_counter nft_chain_nat nf_tables 8021q garp mrp stp llc bonding geneve nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay ext4 mbcache jbd2 dell_smbios intel_rapl_msr wmi_bmof dell_wmi_descriptor intel_rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate intel_uncore pcspkr mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_i801 mei_me mei lpc_ich wmi ipmi_ssif acpi_ipmi ipmi_si acpi_power_meter sctp ip6_udp_tunnel udp_tunnel ip_tables
    [10421938.977337]  esp6 esp4 af_key xfrm_ipcomp xfrm_interface xfrm4_tunnel tunnel4 xfrm6_tunnel tunnel6 ip_gre ip_tunnel gre softdog xfs libcrc32c dm_multipath irdma ice ib_uverbs ib_core sd_mod t10_pi sg crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ahci igb i40e megaraid_sas libahci i2c_algo_bit libata dca dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler fuse
    

Environment

  • Red Hat Enterprise Linux (RHEL)
    • 7
    • 8
    • 9
  • Red Hat Ceph Storage 4
  • high system load
  • increased network latency
  • Red Hat OpenShift Containter Platform (RHOCP)
    • 4.12

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content