page allocation failures noticed while using nvidia GPU related commands
Issue
page allocation failuresnoticed while usingnvidia GPUrelated commands.nvidia proprietary kernel moduleswere noticed in call traces of thepage allocation failures.- Below or similar messages are noticed in the logs:
Apr 14 23:02:00 localhost kernel: UVM GPU5 BH: page allocation failure: order:1, mode:0x2050d0
Apr 14 23:02:00 localhost kernel: CPU: 64 PID: 133632 Comm: UVM GPU5 BH Tainted: P OE ------------ 3.10.0-1062.18.1.el7.x86_64 #1
Apr 14 23:02:00 localhost kernel: Hardware name: Cray Inc. SYS-4029GP-TVRT/X11DGO-T, BIOS 3.3 03/11/2020
Apr 14 23:02:00 localhost kernel: Call Trace:
Apr 14 23:02:00 localhost kernel: [<ffffffff8d57b416>] dump_stack+0x19/0x1b
Apr 14 23:02:00 localhost kernel: [<ffffffff8cfc3fc0>] warn_alloc_failed+0x110/0x180
Apr 14 23:02:00 localhost kernel: [<ffffffff8d57698a>] __alloc_pages_slowpath+0x6bb/0x729
Apr 14 23:02:00 localhost kernel: [<ffffffff8cfc8636>] __alloc_pages_nodemask+0x436/0x450
Apr 14 23:02:00 localhost kernel: [<ffffffff8d016c58>] alloc_pages_current+0x98/0x110
Apr 14 23:02:00 localhost kernel: [<ffffffff8d024fed>] new_slab+0x44d/0x4e0
Apr 14 23:02:00 localhost kernel: [<ffffffff8d02544c>] ___slab_alloc+0x3cc/0x520
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c8c4a>] ? alloc_internal+0x6a/0x80 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c1a51>] ? pick_chunk+0x51/0x70 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c1aa0>] ? try_claim_chunk+0x30/0x80 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c8c4a>] ? alloc_internal+0x6a/0x80 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffff8d577dbc>] __slab_alloc+0x40/0x5c
Apr 14 23:02:00 localhost kernel: [<ffffffff8d026150>] __kmalloc+0x1c0/0x230
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c8c4a>] alloc_internal+0x6a/0x80 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c8e55>] __uvm_kvmalloc_zero+0x25/0x60 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16bda90>] allocate_directory_with_location+0x90/0x130 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16becc8>] uvm_page_tree_get_ptes_async+0x2a8/0x560 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a2604>] ? block_mark_memory_used+0x64/0x70 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a6cf2>] ? block_copy_resident_pages_between+0x952/0xfb0 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16bef92>] uvm_page_tree_get_ptes+0x12/0x30 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a2ce9>] block_alloc_ptes_with_retry+0x339/0x4a0 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a292b>] ? block_gpu_supports_2m.part.26+0x2b/0x40 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a326d>] block_alloc_ptes_new_state+0x2d/0x90 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16aca31>] uvm_va_block_map+0x421/0xf00 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c3c12>] ? uvm_tracker_add_tracker_safe+0x12/0x90 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a7868>] ? block_copy_resident_pages+0x418/0x950 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16cd552>] ? uvm_pmm_gpu_mark_root_chunk_used+0x12/0x20 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16b14ea>] uvm_va_block_service_locked+0x77a/0xf90 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16b6898>] service_batch_managed_faults_in_block_locked+0x778/0x980 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffff8cee9f4e>] ? pick_next_task_fair+0x61e/0x870
Apr 14 23:02:00 localhost kernel: [<ffffffffc16b72d4>] service_fault_batch+0x424/0x660 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16b8175>] uvm_gpu_service_replayable_faults+0x125/0xb10 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c270f>] ? thread_context_non_interrupt_add+0x10f/0x200 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16936d4>] replayable_faults_isr_bottom_half+0x44/0x60 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16937b4>] replayable_faults_isr_bottom_half_entry+0x54/0xb0 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffff8cecc303>] ? down_interruptible+0x33/0x60
Apr 14 23:02:00 localhost kernel: [<ffffffffc16818f1>] _main_loop+0x91/0x190 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc1681860>] ? nvstatusToString+0x50/0x50 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffff8cec6321>] kthread+0xd1/0xe0
Apr 14 23:02:00 localhost kernel: [<ffffffff8cec6250>] ? insert_kthread_work+0x40/0x40
Apr 14 23:02:00 localhost kernel: [<ffffffff8d58dd1d>] ret_from_fork_nospec_begin+0x7/0x21
Apr 14 23:02:00 localhost kernel: [<ffffffff8cec6250>] ? insert_kthread_work+0x40/0x40
Environment
- Red Hat Enterprise Linux
- nvidia GPU
- nvidia proprietary kernel modules (nvidia, nvidia_uvm, nvidia_modeset, nvidia_drm)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.