The kernel panic from function _nv030987rm or _nv025347rm in Red Hat Enterprise Linux

Solution Verified - Updated -

Issue

  • Kernel panic occurred while using in docker container after completion of nvidia driver and firmware upgrade.
  • The call trace is shown below.
[18009.106111] BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
[18009.106922] IP: [<ffffffffc0f562da>] _nv030987rm+0x3a/0x170 [nvidia]
[18009.108588] Oops: 0000 [#1] SMP 
[18009.109317] Modules linked in: veth ib_ipoib ib_cm ib_core xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 tun e
btable_filter ebtables ip6table_filter ip6_tables xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfne
tlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay(T) nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) iTCO_wdt iTCO_vendor_support vfat fat sb_edac edac_core intel_powerclamp coretemp intel_rapl ipmi_ssif iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd nvidia(POE) pcspkr joydev sg hpilo hpwdt ioatdma i2c_i801 shpchp lpc_ich
[18009.114491]  ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx5_core drm crct10dif_pclmul ixgbe crct10dif_common igb crc32c_intel i2c_algo_bit devlink i2c_core  mdio ptp pps_core dca wmi dm_mirror dm_region_hash dm_log dm_mod
[18009.118429] CPU: 1 PID: 14007 Comm: nvidia-smi Tainted: P    B   W  OE  ------------ T 3.10.0-693.el7.x86_64 #1
[18009.119412] Hardware name: HP ProLiant XL270d Gen9/ProLiant XL270d Gen9, BIOS U25 04/25/2017
[18009.120470] task: ffff883de277dee0 ti: ffff883d19bfc000 task.ti: ffff883d19bfc000
[18009.121542] RIP: 0010:[<ffffffffc0f562da>]  [<ffffffffc0f562da>] _nv030987rm+0x3a/0x170 [nvidia]
[18009.122775] RSP: 0018:ffff883f7fa43d38  EFLAGS: 00010002
[18009.123863] RAX: 0000000000000000 RBX: ffff883f563b8008 RCX: ffff883f4bab5f0c
[18009.124940] RDX: 0000000000000020 RSI: 0000000000000000 RDI: ffff883f563b8008
[18009.126050] RBP: ffff883f4bab5e90 R08: 0000000000000020 R09: ffff883f4bab5ed8
[18009.127195] R10: ffffffffc0c16210 R11: ffff883f7fa43da8 R12: 0000000000000000
[18009.128236] R13: 0000000000000002 R14: ffff883f563b8008 R15: ffffffffc1a189e0
[18009.129311] FS:  00007f2778141740(0000) GS:ffff883f7fa40000(0000) knlGS:0000000000000000
[18009.130345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18009.131444] CR2: 0000000000000120 CR3: 0000007e093cd000 CR4: 00000000003407e0
[18009.132515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18009.133555] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[18009.134564] Stack:
[18009.135573]  ffffffffc1a189e0 0000000000000004 ffffffffc1a18a38 0000000000000002
[18009.136556]  ffff883f563b8008 ffffffffc0841c2f 0000000000000001 ffff887dedad1064
[18009.137530]  ffff887f750b0008 ffff887def31c008 ffff883f77aa2c20 ffffffffc0841fd9
[18009.138538] Call Trace:
[18009.139490]  <IRQ> 
[18009.139499] 
[18009.140466]  [<ffffffffc0841c2f>] ? _nv008467rm+0x2ef/0x670 [nvidia]
[18009.141455]  [<ffffffffc0841fd9>] ? _nv034175rm+0x29/0xf0 [nvidia]
[18009.142470]  [<ffffffffc0899062>] ? _nv031123rm+0x62/0x1e0 [nvidia]
[18009.143422]  [<ffffffffc07ef4a0>] ? __x86_indirect_thunk_r15+0xd/0xd [nvidia]
[18009.144315]  [<ffffffffc0f50fc4>] ? rm_run_rc_callback+0x84/0xd0 [nvidia]
[18009.145222]  [<ffffffffc07efcdc>] ? nvidia_rc_timer_callback+0x3c/0x60 [nvidia]
[18009.146097]  [<ffffffffc07ef4aa>] ? nv_timer_callback_anon_data+0xa/0x10 [nvidia]
[18009.146928]  [<ffffffff81097316>] ? call_timer_fn+0x36/0x110
[18009.147797]  [<ffffffffc07ef4a0>] ? __x86_indirect_thunk_r15+0xd/0xd [nvidia]
[18009.148621]  [<ffffffff8109982d>] ? run_timer_softirq+0x22d/0x310
[18009.149439]  [<ffffffff81090b3f>] ? __do_softirq+0xef/0x280
[18009.150282]  [<ffffffff816b6a5c>] ? call_softirq+0x1c/0x30
[18009.151099]  [<ffffffff8102d3c5>] ? do_softirq+0x65/0xa0
[18009.151941]  [<ffffffff81090ec5>] ? irq_exit+0x105/0x110
[18009.152711]  [<ffffffff816b76c2>] ? smp_apic_timer_interrupt+0x42/0x50
[18009.153514]  [<ffffffff816b5c1d>] ? apic_timer_interrupt+0x6d/0x80
[18009.169803] Code: 83 c5 80 48 85 ff 0f 84 f5 00 00 00 4c 8b a7 88 0d 00 00 31 c0 4d 85 e4 74 04 49 8b 04 24 80 bb ef 0c 00 00 00 0f 84 c4 00 00 00 <49> 8b 94 24 20 01 00 00 48 8b 52 28 8b 12 39 50 08 0f 84 ad 00 
[18009.170922] RIP  [<ffffffffc0f562da>] _nv030987rm+0x3a/0x170 [nvidia]
[18009.171545]  RSP <ffff883f7fa43d38>
[18009.172033] CR2: 0000000000000120
  • Memory corruption is shown at kmalloc-512 but it's involved in nvidia module.
[ 1987.816759] =============================================================================
[ 1987.816793] BUG kmalloc-512(0:486eb754281b11e63febe790c11fc27cbac87d87a2bc9e0a68cd13d58e0511fa) (Tainted: P           OE  ------------ T): Objects remaining in kmalloc-512(0:486eb754281b11e63febe790c11fc27cbac87d87a2bc9e0a68cd13d58e0511fa
[ 1987.816852] -----------------------------------------------------------------------------

[ 1987.816880] INFO: Slab 0xffffea00fd77e500 objects=32 used=1 fp=0xffff883f5df94000 flags=0x2fffff00004080
[ 1987.816908] CPU: 1 PID: 30194 Comm: nvc:[driver] Tainted: P    B      OE  ------------ T 3.10.0-693.el7.x86_64 #1
[ 1987.816910]  ffffea00fd77e500 00000000023f4faf ffff883f437cf928 ffffffff816a3d91
[ 1987.816912]  ffff883f437cfa00 ffffffff811dbf44 0000000000000020 ffff883f437cfa10
[ 1987.816914]  ffff883f437cf9c0 656a624ffffffffc 616d657220737463 6e6920676e696e69
[ 1987.816915] Call Trace:
[ 1987.816923]  [<ffffffff816a3d91>] dump_stack+0x19/0x1b
[ 1987.816927]  [<ffffffff811dbf44>] slab_err+0xb4/0xe0
[ 1987.816929]  [<ffffffff811dd990>] ? kmem_cache_alloc_bulk+0x140/0x140
[ 1987.816931]  [<ffffffff811e0613>] ? __kmalloc+0x1e3/0x230
[ 1987.816933]  [<ffffffff811e1907>] ? kmem_cache_close+0x127/0x2e0
[ 1987.816935]  [<ffffffff811e1929>] kmem_cache_close+0x149/0x2e0
[ 1987.816937]  [<ffffffff811e1ad4>] __kmem_cache_shutdown+0x14/0x80
[ 1987.816940]  [<ffffffff811a6874>] kmem_cache_destroy+0x44/0xf0
[ 1987.816943]  [<ffffffff811f6009>] kmem_cache_destroy_memcg_children+0x89/0xb0
[ 1987.816945]  [<ffffffff811a6849>] kmem_cache_destroy+0x19/0xf0
[ 1987.816967]  [<ffffffffc5de62bc>] deinit_pma_address_batch_cache.isra.19+0x2c/0x40 [nvidia_uvm]
[ 1987.816976]  [<ffffffffc5de91b8>] uvm_pmm_gpu_deinit+0x48/0x80 [nvidia_uvm]
[ 1987.816982]  [<ffffffffc5dac6e6>] remove_gpu+0x226/0x2c0 [nvidia_uvm]
[ 1987.816988]  [<ffffffffc5dac911>] uvm_gpu_release_locked+0x21/0x40 [nvidia_uvm]
[ 1987.816995]  [<ffffffffc5db2098>] uvm_va_space_destroy+0x3c8/0x420 [nvidia_uvm]
[ 1987.816999]  [<ffffffffc5da24a0>] uvm_release.isra.4+0x80/0xa0 [nvidia_uvm]
[ 1987.817020]  [<ffffffffc5da2584>] uvm_release_entry+0x54/0xb0 [nvidia_uvm]
...
[ 1987.817067] INFO: Object 0xffff883f5df96e00 @offset=11776
  • another pattern
[  153.211296] nvidia 0000:65:00.0: irq 145 for MSI/MSI-X
[  153.734708] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  153.734900] IP: [<ffffffffc0cb12f8>] _nv025347rm+0x8/0x40 [nvidia]
[  153.734901] PGD 8000000780bc9067 PUD 7e0a8e067 PMD 0 
[  153.734902] Oops: 0002 [#1] SMP 
[  153.734928] Modules linked in: fuse ip6table_filter ip6_tables iptable_filter sctp_diag sctp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag snapapi26(POE) mpt2sas raid_class scsi_transport_sas mptctl mptbase team_mode_activebackup team dell_rbu skx_edac edac_core intel_powerclamp nvidia_drm(POE) coretemp snd_hda_codec_hdmi nvidia_modeset(POE) intel_rapl iosf_mbi kvm_intel kvm irqbypass snd_hda_intel snd_hda_codec nvidia(POE) crc32_pclmul ghash_clmulni_intel snd_hda_core aesni_intel lrw snd_hwdep gf128mul glue_helper ablk_helper cryptd joydev snd_seq snd_seq_device snd_pcm snd_timer dell_smbios sparse_keymap snd iTCO_wdt soundcore iTCO_vendor_support dcdbas ipmi_ssif pcspkr sg i2c_i801 shpchp lpc_ich ipmi_si ipmi_devintf ipmi_msghandler mei_me mei nfit libnvdimm acpi_power_meter
[  153.734942]  acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod sr_mod crc_t10dif crct10dif_generic cdrom mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul ttm crct10dif_common crc32c_intel ixgbe ahci drm libahci megaraid_sas libata tg3 mdio dca ptp i2c_core pps_core wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: edd]
[  153.734944] CPU: 3 PID: 17741 Comm: nvidia-settings Tainted: P           OE  ------------   3.10.0-693.11.6.el7.x86_64 #1
[  153.734945] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 2.1.8 04/30/2019
[  153.734946] task: ffff88085a77bf40 ti: ffff8807d63cc000 task.ti: ffff8807d63cc000
[  153.735110] RIP: 0010:[<ffffffffc0cb12f8>]  [<ffffffffc0cb12f8>] _nv025347rm+0x8/0x40 [nvidia]
[  153.735111] RSP: 0018:ffff8807d63cf988  EFLAGS: 00010296
[  153.735111] RAX: 0000000000000000 RBX: 000000000000001c RCX: ffff88081f2a2e48
[  153.735112] RDX: ffff8808595ea008 RSI: ffff8807dd6ce008 RDI: ffff880802218008
[  153.735112] RBP: ffff88081f2a2e40 R08: ffffffffc0cb12f0 R09: ffff88081f2a29ec
[  153.735113] R10: 000000000000454d R11: 0000000000000000 R12: ffff880802218008
[  153.735113] R13: ffff880034984008 R14: ffff880802218008 R15: ffff8807cc7f0008
[  153.735114] FS:  00007fa1b358a740(0000) GS:ffff880858cc0000(0000) knlGS:0000000000000000
[  153.735115] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  153.735116] CR2: 0000000000000000 CR3: 0000000854cac000 CR4: 00000000003607e0
[  153.735116] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  153.735117] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  153.735117] Call Trace:
[  153.735268]  [<ffffffffc0d2277a>] ? _nv031562rm+0x7a/0xb0 [nvidia]
[  153.735417]  [<ffffffffc0d1965c>] ? _nv031902rm+0x6ec/0x2440 [nvidia]
[  153.735568]  [<ffffffffc0bf551b>] ? _nv021386rm+0xbb/0x1a0 [nvidia]
[  153.735678]  [<ffffffffc08574c7>] ? _nv021634rm+0x27/0x50 [nvidia]
[  153.735752]  [<ffffffffc0f35f30>] ? _nv000901rm+0x1200/0x1cc0 [nvidia]
[  153.735829]  [<ffffffffc0f2c785>] ? rm_init_adapter+0xd5/0xe0 [nvidia]
[  153.735870]  [<ffffffffc07ce871>] ? nv_open_device+0x281/0x800 [nvidia]
[  153.735872]  [<ffffffff811e1495>] ? kmem_cache_alloc+0x35/0x1e0
[  153.735913]  [<ffffffffc07cf0e7>] ? nvidia_open+0x2f7/0x540 [nvidia]
[  153.735915]  [<ffffffff812076bc>] ? cdev_get+0x2c/0x50
[  153.735957]  [<ffffffffc07cd382>] ? nvidia_frontend_open+0x52/0xb0 [nvidia]
[  153.735958]  [<ffffffff81207e22>] ? chrdev_open+0xb2/0x1b0
[  153.735961]  [<ffffffff81200427>] ? do_dentry_open+0x1a7/0x2e0
[  153.735962]  [<ffffffff812b409c>] ? security_inode_permission+0x1c/0x30
[  153.735964]  [<ffffffff81207d70>] ? cdev_put+0x30/0x30
[  153.735966]  [<ffffffff812005fa>] ? vfs_open+0x5a/0xb0
[  153.735967]  [<ffffffff8120e2e8>] ? may_open+0x68/0x110
[  153.735968]  [<ffffffff8121175d>] ? do_last+0x1ed/0x12c0
[  153.735970]  [<ffffffff812128f2>] ? path_openat+0xc2/0x490
[  153.735971]  [<ffffffff81214e8b>] ? do_filp_open+0x4b/0xb0
[  153.735973]  [<ffffffff8122213a>] ? __alloc_fd+0x8a/0x130
[  153.735975]  [<ffffffff812019c3>] ? do_sys_open+0xf3/0x1f0
[  153.735976]  [<ffffffff81201ade>] ? SyS_open+0x1e/0x20
[  153.735978]  [<ffffffff816b89fd>] ? system_call_fastpath+0x16/0x1b
[  153.735993] Code: 0f 1f 00 41 8b 4d 08 41 39 0a 4c 89 d6 0f 82 5b fe ff ff e9 69 fe ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 42 48 <83> 00 01 c6 42 10 01 e8 5c f4 ff ff 85 c0 74 13 89 c2 be 00 10 
[  153.736145] RIP  [<ffffffffc0cb12f8>] _nv025347rm+0x8/0x40 [nvidia]
[  153.736146]  RSP <ffff8807d63cf988>
[  153.736146] CR2: 0000000000000000
  • Another Pattern
[3552892.472685] BUG: unable to handle kernel paging request at 0000000000002340
[3552892.472745] IP: [<ffffffffc042af3a>] _nv035782rm+0x1a/0x2f0 [nvidia]
[3552892.473061] PGD 0 
[3552892.473080] Oops: 0000 [#1] SMP 
[3552892.473124] Modules linked in: binfmt_misc nvidia_uvm(OE) gsch(OE) redirfs(OE) bmhook(OE) acdc(POE) tmhook(OE) dsa_filter(POE) dsa_filter_hook(OE) vmw_vsock_vmci_transport vsock vfat fat iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul vmw_balloon glue_helper ablk_helper cryptd sg pcspkr joydev parport_pc parport vmw_vmci i2c_piix4 ip_tables xfs libcrc32c nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) ata_generic pata_acpi vmwgfx sd_mod drm_kms_helper crc_t10dif crct10dif_generic ttm syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ata_piix libahci drm crct10dif_pclmul libata crct10dif_common crc32c_intel nfit libnvdimm serio_raw vmxnet3 vmw_pvscsi drm_panel_orientation_quirks floppy dm_mirror dm_region_hash dm_log dm_mod fuse
[3552892.473680] CPU: 29 PID: 686 Comm: nv_queue Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.21.1.el7.x86_64 #1
[3552892.473750] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.2008052155 08/05/2020
[3552892.473820] task: ffff8de04ac8d280 ti: ffff8de1503e8000 task.ti: ffff8de1503e8000
[3552892.473866] RIP: 0010:[<ffffffffc042af3a>]  [<ffffffffc042af3a>] _nv035782rm+0x1a/0x2f0 [nvidia]
[3552892.474175] RSP: 0018:ffff8de1503ebd68  EFLAGS: 00010286
[3552892.474210] RAX: 0000000000000000 RBX: ffff8de144be3808 RCX: ffff8de1657b8fe8
[3552892.474253] RDX: ffff8de1657b8fe8 RSI: 0000000000000001 RDI: 0000000000000000
[3552892.474299] RBP: ffff8df591a92ea0 R08: 0000000000000020 R09: ffff8df591a92f78
[3552892.474343] R10: 0000000000000000 R11: 0000000000000400 R12: 0000000000000000
[3552892.474387] R13: 0000000000000000 R14: ffff8de144be3808 R15: ffff8df1dd9e3380
[3552892.474431] FS:  0000000000000000(0000) GS:ffff8df6a3d40000(0000) knlGS:0000000000000000
[3552892.474480] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3552892.474516] CR2: 0000000000002340 CR3: 00000004a2010000 CR4: 00000000007607e0
[3552892.474669] PKRU: 00000000
[3552892.474692] Call Trace:
[3552892.474942]  [<ffffffffc042b2af>] ? _nv011229rm+0x9f/0x100 [nvidia]
[3552892.475320]  [<ffffffffc0da4e35>] ? rm_execute_work_item+0xc5/0x120 [nvidia]
[3552892.475560]  [<ffffffffc040b7ea>] ? os_execute_work_item+0x4a/0x60 [nvidia]
[3552892.475799]  [<ffffffffc040e871>] ? _main_loop+0x91/0x190 [nvidia]
[3552892.476039]  [<ffffffffc040e7e0>] ? nvidia_modeset_resume+0x30/0x30 [nvidia]
[3552892.476089]  [<ffffffffa4cc5da1>] ? kthread+0xd1/0xe0
[3552892.476142]  [<ffffffffa4cc5cd0>] ? insert_kthread_work+0x40/0x40
[3552892.476186]  [<ffffffffa5395ddd>] ? ret_from_fork_nospec_begin+0x7/0x21
[3552892.476229]  [<ffffffffa4cc5cd0>] ? insert_kthread_work+0x40/0x40
[3552892.476267] Code: 18 c2 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 57 41 56 41 55 41 54 49 89 fd 53 48 81 ed 60 01 00 00 4c 8b 35 ce 8f f8 01 <4c> 8b bf 40 23 00 00 48 c7 45 08 00 00 00 00 49 8b 9e 40 02 00 
[3552892.476501] RIP  [<ffffffffc042af3a>] _nv035782rm+0x1a/0x2f0 [nvidia]
[3552892.476766]  RSP <ffff8de1503ebd68>
[3552892.476789] CR2: 0000000000002340

Environment

  • Red Hat Enterprise Linux (RHEL) 7.4 (3.10.0-693.el7.x86_6)
  • 3rd Party nvidia driver, nvidia-driver-latest-440.95.01-1.el7.x86_64 / 440.100

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content