The kernel panic from function _nv030987rm or _nv025347rm in Red Hat Enterprise Linux
Issue
- Kernel panic occurred while using in docker container after completion of nvidia driver and firmware upgrade.
- The call trace is shown below.
[18009.106111] BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
[18009.106922] IP: [<ffffffffc0f562da>] _nv030987rm+0x3a/0x170 [nvidia]
[18009.108588] Oops: 0000 [#1] SMP
[18009.109317] Modules linked in: veth ib_ipoib ib_cm ib_core xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 tun e
btable_filter ebtables ip6table_filter ip6_tables xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfne
tlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay(T) nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) iTCO_wdt iTCO_vendor_support vfat fat sb_edac edac_core intel_powerclamp coretemp intel_rapl ipmi_ssif iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd nvidia(POE) pcspkr joydev sg hpilo hpwdt ioatdma i2c_i801 shpchp lpc_ich
[18009.114491] ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core drm crct10dif_pclmul ixgbe crct10dif_common igb crc32c_intel i2c_algo_bit devlink i2c_core mdio ptp pps_core dca wmi dm_mirror dm_region_hash dm_log dm_mod
[18009.118429] CPU: 1 PID: 14007 Comm: nvidia-smi Tainted: P B W OE ------------ T 3.10.0-693.el7.x86_64 #1
[18009.119412] Hardware name: HP ProLiant XL270d Gen9/ProLiant XL270d Gen9, BIOS U25 04/25/2017
[18009.120470] task: ffff883de277dee0 ti: ffff883d19bfc000 task.ti: ffff883d19bfc000
[18009.121542] RIP: 0010:[<ffffffffc0f562da>] [<ffffffffc0f562da>] _nv030987rm+0x3a/0x170 [nvidia]
[18009.122775] RSP: 0018:ffff883f7fa43d38 EFLAGS: 00010002
[18009.123863] RAX: 0000000000000000 RBX: ffff883f563b8008 RCX: ffff883f4bab5f0c
[18009.124940] RDX: 0000000000000020 RSI: 0000000000000000 RDI: ffff883f563b8008
[18009.126050] RBP: ffff883f4bab5e90 R08: 0000000000000020 R09: ffff883f4bab5ed8
[18009.127195] R10: ffffffffc0c16210 R11: ffff883f7fa43da8 R12: 0000000000000000
[18009.128236] R13: 0000000000000002 R14: ffff883f563b8008 R15: ffffffffc1a189e0
[18009.129311] FS: 00007f2778141740(0000) GS:ffff883f7fa40000(0000) knlGS:0000000000000000
[18009.130345] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18009.131444] CR2: 0000000000000120 CR3: 0000007e093cd000 CR4: 00000000003407e0
[18009.132515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18009.133555] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[18009.134564] Stack:
[18009.135573] ffffffffc1a189e0 0000000000000004 ffffffffc1a18a38 0000000000000002
[18009.136556] ffff883f563b8008 ffffffffc0841c2f 0000000000000001 ffff887dedad1064
[18009.137530] ffff887f750b0008 ffff887def31c008 ffff883f77aa2c20 ffffffffc0841fd9
[18009.138538] Call Trace:
[18009.139490] <IRQ>
[18009.139499]
[18009.140466] [<ffffffffc0841c2f>] ? _nv008467rm+0x2ef/0x670 [nvidia]
[18009.141455] [<ffffffffc0841fd9>] ? _nv034175rm+0x29/0xf0 [nvidia]
[18009.142470] [<ffffffffc0899062>] ? _nv031123rm+0x62/0x1e0 [nvidia]
[18009.143422] [<ffffffffc07ef4a0>] ? __x86_indirect_thunk_r15+0xd/0xd [nvidia]
[18009.144315] [<ffffffffc0f50fc4>] ? rm_run_rc_callback+0x84/0xd0 [nvidia]
[18009.145222] [<ffffffffc07efcdc>] ? nvidia_rc_timer_callback+0x3c/0x60 [nvidia]
[18009.146097] [<ffffffffc07ef4aa>] ? nv_timer_callback_anon_data+0xa/0x10 [nvidia]
[18009.146928] [<ffffffff81097316>] ? call_timer_fn+0x36/0x110
[18009.147797] [<ffffffffc07ef4a0>] ? __x86_indirect_thunk_r15+0xd/0xd [nvidia]
[18009.148621] [<ffffffff8109982d>] ? run_timer_softirq+0x22d/0x310
[18009.149439] [<ffffffff81090b3f>] ? __do_softirq+0xef/0x280
[18009.150282] [<ffffffff816b6a5c>] ? call_softirq+0x1c/0x30
[18009.151099] [<ffffffff8102d3c5>] ? do_softirq+0x65/0xa0
[18009.151941] [<ffffffff81090ec5>] ? irq_exit+0x105/0x110
[18009.152711] [<ffffffff816b76c2>] ? smp_apic_timer_interrupt+0x42/0x50
[18009.153514] [<ffffffff816b5c1d>] ? apic_timer_interrupt+0x6d/0x80
[18009.169803] Code: 83 c5 80 48 85 ff 0f 84 f5 00 00 00 4c 8b a7 88 0d 00 00 31 c0 4d 85 e4 74 04 49 8b 04 24 80 bb ef 0c 00 00 00 0f 84 c4 00 00 00 <49> 8b 94 24 20 01 00 00 48 8b 52 28 8b 12 39 50 08 0f 84 ad 00
[18009.170922] RIP [<ffffffffc0f562da>] _nv030987rm+0x3a/0x170 [nvidia]
[18009.171545] RSP <ffff883f7fa43d38>
[18009.172033] CR2: 0000000000000120
- Memory corruption is shown at kmalloc-512 but it's involved in nvidia module.
[ 1987.816759] =============================================================================
[ 1987.816793] BUG kmalloc-512(0:486eb754281b11e63febe790c11fc27cbac87d87a2bc9e0a68cd13d58e0511fa) (Tainted: P OE ------------ T): Objects remaining in kmalloc-512(0:486eb754281b11e63febe790c11fc27cbac87d87a2bc9e0a68cd13d58e0511fa
[ 1987.816852] -----------------------------------------------------------------------------
[ 1987.816880] INFO: Slab 0xffffea00fd77e500 objects=32 used=1 fp=0xffff883f5df94000 flags=0x2fffff00004080
[ 1987.816908] CPU: 1 PID: 30194 Comm: nvc:[driver] Tainted: P B OE ------------ T 3.10.0-693.el7.x86_64 #1
[ 1987.816910] ffffea00fd77e500 00000000023f4faf ffff883f437cf928 ffffffff816a3d91
[ 1987.816912] ffff883f437cfa00 ffffffff811dbf44 0000000000000020 ffff883f437cfa10
[ 1987.816914] ffff883f437cf9c0 656a624ffffffffc 616d657220737463 6e6920676e696e69
[ 1987.816915] Call Trace:
[ 1987.816923] [<ffffffff816a3d91>] dump_stack+0x19/0x1b
[ 1987.816927] [<ffffffff811dbf44>] slab_err+0xb4/0xe0
[ 1987.816929] [<ffffffff811dd990>] ? kmem_cache_alloc_bulk+0x140/0x140
[ 1987.816931] [<ffffffff811e0613>] ? __kmalloc+0x1e3/0x230
[ 1987.816933] [<ffffffff811e1907>] ? kmem_cache_close+0x127/0x2e0
[ 1987.816935] [<ffffffff811e1929>] kmem_cache_close+0x149/0x2e0
[ 1987.816937] [<ffffffff811e1ad4>] __kmem_cache_shutdown+0x14/0x80
[ 1987.816940] [<ffffffff811a6874>] kmem_cache_destroy+0x44/0xf0
[ 1987.816943] [<ffffffff811f6009>] kmem_cache_destroy_memcg_children+0x89/0xb0
[ 1987.816945] [<ffffffff811a6849>] kmem_cache_destroy+0x19/0xf0
[ 1987.816967] [<ffffffffc5de62bc>] deinit_pma_address_batch_cache.isra.19+0x2c/0x40 [nvidia_uvm]
[ 1987.816976] [<ffffffffc5de91b8>] uvm_pmm_gpu_deinit+0x48/0x80 [nvidia_uvm]
[ 1987.816982] [<ffffffffc5dac6e6>] remove_gpu+0x226/0x2c0 [nvidia_uvm]
[ 1987.816988] [<ffffffffc5dac911>] uvm_gpu_release_locked+0x21/0x40 [nvidia_uvm]
[ 1987.816995] [<ffffffffc5db2098>] uvm_va_space_destroy+0x3c8/0x420 [nvidia_uvm]
[ 1987.816999] [<ffffffffc5da24a0>] uvm_release.isra.4+0x80/0xa0 [nvidia_uvm]
[ 1987.817020] [<ffffffffc5da2584>] uvm_release_entry+0x54/0xb0 [nvidia_uvm]
...
[ 1987.817067] INFO: Object 0xffff883f5df96e00 @offset=11776
- another pattern
[ 153.211296] nvidia 0000:65:00.0: irq 145 for MSI/MSI-X
[ 153.734708] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 153.734900] IP: [<ffffffffc0cb12f8>] _nv025347rm+0x8/0x40 [nvidia]
[ 153.734901] PGD 8000000780bc9067 PUD 7e0a8e067 PMD 0
[ 153.734902] Oops: 0002 [#1] SMP
[ 153.734928] Modules linked in: fuse ip6table_filter ip6_tables iptable_filter sctp_diag sctp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag snapapi26(POE) mpt2sas raid_class scsi_transport_sas mptctl mptbase team_mode_activebackup team dell_rbu skx_edac edac_core intel_powerclamp nvidia_drm(POE) coretemp snd_hda_codec_hdmi nvidia_modeset(POE) intel_rapl iosf_mbi kvm_intel kvm irqbypass snd_hda_intel snd_hda_codec nvidia(POE) crc32_pclmul ghash_clmulni_intel snd_hda_core aesni_intel lrw snd_hwdep gf128mul glue_helper ablk_helper cryptd joydev snd_seq snd_seq_device snd_pcm snd_timer dell_smbios sparse_keymap snd iTCO_wdt soundcore iTCO_vendor_support dcdbas ipmi_ssif pcspkr sg i2c_i801 shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler mei_me mei nfit libnvdimm acpi_power_meter
[ 153.734942] acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod sr_mod crc_t10dif crct10dif_generic cdrom mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul ttm crct10dif_common crc32c_intel ixgbe ahci drm libahci megaraid_sas libata tg3 mdio dca ptp i2c_core pps_core wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: edd]
[ 153.734944] CPU: 3 PID: 17741 Comm: nvidia-settings Tainted: P OE ------------ 3.10.0-693.11.6.el7.x86_64 #1
[ 153.734945] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 2.1.8 04/30/2019
[ 153.734946] task: ffff88085a77bf40 ti: ffff8807d63cc000 task.ti: ffff8807d63cc000
[ 153.735110] RIP: 0010:[<ffffffffc0cb12f8>] [<ffffffffc0cb12f8>] _nv025347rm+0x8/0x40 [nvidia]
[ 153.735111] RSP: 0018:ffff8807d63cf988 EFLAGS: 00010296
[ 153.735111] RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff88081f2a2e48
[ 153.735112] RDX: ffff8808595ea008 RSI: ffff8807dd6ce008 RDI: ffff880802218008
[ 153.735112] RBP: ffff88081f2a2e40 R08: ffffffffc0cb12f0 R09: ffff88081f2a29ec
[ 153.735113] R10: 000000000000454d R11: 0000000000000000 R12: ffff880802218008
[ 153.735113] R13: ffff880034984008 R14: ffff880802218008 R15: ffff8807cc7f0008
[ 153.735114] FS: 00007fa1b358a740(0000) GS:ffff880858cc0000(0000) knlGS:0000000000000000
[ 153.735115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 153.735116] CR2: 0000000000000000 CR3: 0000000854cac000 CR4: 00000000003607e0
[ 153.735116] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 153.735117] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 153.735117] Call Trace:
[ 153.735268] [<ffffffffc0d2277a>] ? _nv031562rm+0x7a/0xb0 [nvidia]
[ 153.735417] [<ffffffffc0d1965c>] ? _nv031902rm+0x6ec/0x2440 [nvidia]
[ 153.735568] [<ffffffffc0bf551b>] ? _nv021386rm+0xbb/0x1a0 [nvidia]
[ 153.735678] [<ffffffffc08574c7>] ? _nv021634rm+0x27/0x50 [nvidia]
[ 153.735752] [<ffffffffc0f35f30>] ? _nv000901rm+0x1200/0x1cc0 [nvidia]
[ 153.735829] [<ffffffffc0f2c785>] ? rm_init_adapter+0xd5/0xe0 [nvidia]
[ 153.735870] [<ffffffffc07ce871>] ? nv_open_device+0x281/0x800 [nvidia]
[ 153.735872] [<ffffffff811e1495>] ? kmem_cache_alloc+0x35/0x1e0
[ 153.735913] [<ffffffffc07cf0e7>] ? nvidia_open+0x2f7/0x540 [nvidia]
[ 153.735915] [<ffffffff812076bc>] ? cdev_get+0x2c/0x50
[ 153.735957] [<ffffffffc07cd382>] ? nvidia_frontend_open+0x52/0xb0 [nvidia]
[ 153.735958] [<ffffffff81207e22>] ? chrdev_open+0xb2/0x1b0
[ 153.735961] [<ffffffff81200427>] ? do_dentry_open+0x1a7/0x2e0
[ 153.735962] [<ffffffff812b409c>] ? security_inode_permission+0x1c/0x30
[ 153.735964] [<ffffffff81207d70>] ? cdev_put+0x30/0x30
[ 153.735966] [<ffffffff812005fa>] ? vfs_open+0x5a/0xb0
[ 153.735967] [<ffffffff8120e2e8>] ? may_open+0x68/0x110
[ 153.735968] [<ffffffff8121175d>] ? do_last+0x1ed/0x12c0
[ 153.735970] [<ffffffff812128f2>] ? path_openat+0xc2/0x490
[ 153.735971] [<ffffffff81214e8b>] ? do_filp_open+0x4b/0xb0
[ 153.735973] [<ffffffff8122213a>] ? __alloc_fd+0x8a/0x130
[ 153.735975] [<ffffffff812019c3>] ? do_sys_open+0xf3/0x1f0
[ 153.735976] [<ffffffff81201ade>] ? SyS_open+0x1e/0x20
[ 153.735978] [<ffffffff816b89fd>] ? system_call_fastpath+0x16/0x1b
[ 153.735993] Code: 0f 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10
[ 153.736145] RIP [<ffffffffc0cb12f8>] _nv025347rm+0x8/0x40 [nvidia]
[ 153.736146] RSP <ffff8807d63cf988>
[ 153.736146] CR2: 0000000000000000
- Another Pattern
[3552892.472685] BUG: unable to handle kernel paging request at 0000000000002340
[3552892.472745] IP: [<ffffffffc042af3a>] _nv035782rm+0x1a/0x2f0 [nvidia]
[3552892.473061] PGD 0
[3552892.473080] Oops: 0000 [#1] SMP
[3552892.473124] Modules linked in: binfmt_misc nvidia_uvm(OE) gsch(OE) redirfs(OE) bmhook(OE) acdc(POE) tmhook(OE) dsa_filter(POE) dsa_filter_hook(OE) vmw_vsock_vmci_transport vsock vfat fat iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul vmw_balloon glue_helper ablk_helper cryptd sg pcspkr joydev parport_pc parport vmw_vmci i2c_piix4 ip_tables xfs libcrc32c nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) ata_generic pata_acpi vmwgfx sd_mod drm_kms_helper crc_t10dif crct10dif_generic ttm syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ata_piix libahci drm crct10dif_pclmul libata crct10dif_common crc32c_intel nfit libnvdimm serio_raw vmxnet3 vmw_pvscsi drm_panel_orientation_quirks floppy dm_mirror dm_region_hash dm_log dm_mod fuse
[3552892.473680] CPU: 29 PID: 686 Comm: nv_queue Kdump: loaded Tainted: P OE ------------ 3.10.0-1160.21.1.el7.x86_64 #1
[3552892.473750] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.2008052155 08/05/2020
[3552892.473820] task: ffff8de04ac8d280 ti: ffff8de1503e8000 task.ti: ffff8de1503e8000
[3552892.473866] RIP: 0010:[<ffffffffc042af3a>] [<ffffffffc042af3a>] _nv035782rm+0x1a/0x2f0 [nvidia]
[3552892.474175] RSP: 0018:ffff8de1503ebd68 EFLAGS: 00010286
[3552892.474210] RAX: 0000000000000000 RBX: ffff8de144be3808 RCX: ffff8de1657b8fe8
[3552892.474253] RDX: ffff8de1657b8fe8 RSI: 0000000000000001 RDI: 0000000000000000
[3552892.474299] RBP: ffff8df591a92ea0 R08: 0000000000000020 R09: ffff8df591a92f78
[3552892.474343] R10: 0000000000000000 R11: 0000000000000400 R12: 0000000000000000
[3552892.474387] R13: 0000000000000000 R14: ffff8de144be3808 R15: ffff8df1dd9e3380
[3552892.474431] FS: 0000000000000000(0000) GS:ffff8df6a3d40000(0000) knlGS:0000000000000000
[3552892.474480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3552892.474516] CR2: 0000000000002340 CR3: 00000004a2010000 CR4: 00000000007607e0
[3552892.474669] PKRU: 00000000
[3552892.474692] Call Trace:
[3552892.474942] [<ffffffffc042b2af>] ? _nv011229rm+0x9f/0x100 [nvidia]
[3552892.475320] [<ffffffffc0da4e35>] ? rm_execute_work_item+0xc5/0x120 [nvidia]
[3552892.475560] [<ffffffffc040b7ea>] ? os_execute_work_item+0x4a/0x60 [nvidia]
[3552892.475799] [<ffffffffc040e871>] ? _main_loop+0x91/0x190 [nvidia]
[3552892.476039] [<ffffffffc040e7e0>] ? nvidia_modeset_resume+0x30/0x30 [nvidia]
[3552892.476089] [<ffffffffa4cc5da1>] ? kthread+0xd1/0xe0
[3552892.476142] [<ffffffffa4cc5cd0>] ? insert_kthread_work+0x40/0x40
[3552892.476186] [<ffffffffa5395ddd>] ? ret_from_fork_nospec_begin+0x7/0x21
[3552892.476229] [<ffffffffa4cc5cd0>] ? insert_kthread_work+0x40/0x40
[3552892.476267] Code: 18 c2 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 57 41 56 41 55 41 54 49 89 fd 53 48 81 ed 60 01 00 00 4c 8b 35 ce 8f f8 01 <4c> 8b bf 40 23 00 00 48 c7 45 08 00 00 00 00 49 8b 9e 40 02 00
[3552892.476501] RIP [<ffffffffc042af3a>] _nv035782rm+0x1a/0x2f0 [nvidia]
[3552892.476766] RSP <ffff8de1503ebd68>
[3552892.476789] CR2: 0000000000002340
Environment
- Red Hat Enterprise Linux (RHEL) 7.4 (3.10.0-693.el7.x86_6)
- 3rd Party nvidia driver,
nvidia-driver-latest-440.95.01-1.el7.x86_64
/ 440.100
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.