page allocation failures noticed while using nvidia GPU related commands

Issue

page allocation failures noticed while using nvidia GPU related commands.
nvidia proprietary kernel modules were noticed in call traces of the page allocation failures.
Below or similar messages are noticed in the logs:

Apr 14 23:02:00 localhost kernel: UVM GPU5 BH: page allocation failure: order:1, mode:0x2050d0
Apr 14 23:02:00 localhost kernel: CPU: 64 PID: 133632 Comm: UVM GPU5 BH Tainted: P           OE  ------------   3.10.0-1062.18.1.el7.x86_64 #1
Apr 14 23:02:00 localhost kernel: Hardware name: Cray Inc. SYS-4029GP-TVRT/X11DGO-T, BIOS 3.3 03/11/2020
Apr 14 23:02:00 localhost kernel: Call Trace:
Apr 14 23:02:00 localhost kernel: [<ffffffff8d57b416>] dump_stack+0x19/0x1b
Apr 14 23:02:00 localhost kernel: [<ffffffff8cfc3fc0>] warn_alloc_failed+0x110/0x180
Apr 14 23:02:00 localhost kernel: [<ffffffff8d57698a>] __alloc_pages_slowpath+0x6bb/0x729
Apr 14 23:02:00 localhost kernel: [<ffffffff8cfc8636>] __alloc_pages_nodemask+0x436/0x450
Apr 14 23:02:00 localhost kernel: [<ffffffff8d016c58>] alloc_pages_current+0x98/0x110
Apr 14 23:02:00 localhost kernel: [<ffffffff8d024fed>] new_slab+0x44d/0x4e0
Apr 14 23:02:00 localhost kernel: [<ffffffff8d02544c>] ___slab_alloc+0x3cc/0x520
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c8c4a>] ? alloc_internal+0x6a/0x80 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c1a51>] ? pick_chunk+0x51/0x70 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c1aa0>] ? try_claim_chunk+0x30/0x80 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c8c4a>] ? alloc_internal+0x6a/0x80 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffff8d577dbc>] __slab_alloc+0x40/0x5c
Apr 14 23:02:00 localhost kernel: [<ffffffff8d026150>] __kmalloc+0x1c0/0x230
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c8c4a>] alloc_internal+0x6a/0x80 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c8e55>] __uvm_kvmalloc_zero+0x25/0x60 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16bda90>] allocate_directory_with_location+0x90/0x130 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16becc8>] uvm_page_tree_get_ptes_async+0x2a8/0x560 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a2604>] ? block_mark_memory_used+0x64/0x70 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a6cf2>] ? block_copy_resident_pages_between+0x952/0xfb0 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16bef92>] uvm_page_tree_get_ptes+0x12/0x30 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a2ce9>] block_alloc_ptes_with_retry+0x339/0x4a0 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a292b>] ? block_gpu_supports_2m.part.26+0x2b/0x40 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a326d>] block_alloc_ptes_new_state+0x2d/0x90 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16aca31>] uvm_va_block_map+0x421/0xf00 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c3c12>] ? uvm_tracker_add_tracker_safe+0x12/0x90 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16a7868>] ? block_copy_resident_pages+0x418/0x950 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16cd552>] ? uvm_pmm_gpu_mark_root_chunk_used+0x12/0x20 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16b14ea>] uvm_va_block_service_locked+0x77a/0xf90 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16b6898>] service_batch_managed_faults_in_block_locked+0x778/0x980 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffff8cee9f4e>] ? pick_next_task_fair+0x61e/0x870
Apr 14 23:02:00 localhost kernel: [<ffffffffc16b72d4>] service_fault_batch+0x424/0x660 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16b8175>] uvm_gpu_service_replayable_faults+0x125/0xb10 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16c270f>] ? thread_context_non_interrupt_add+0x10f/0x200 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16936d4>] replayable_faults_isr_bottom_half+0x44/0x60 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc16937b4>] replayable_faults_isr_bottom_half_entry+0x54/0xb0 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffff8cecc303>] ? down_interruptible+0x33/0x60
Apr 14 23:02:00 localhost kernel: [<ffffffffc16818f1>] _main_loop+0x91/0x190 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffffc1681860>] ? nvstatusToString+0x50/0x50 [nvidia_uvm]
Apr 14 23:02:00 localhost kernel: [<ffffffff8cec6321>] kthread+0xd1/0xe0
Apr 14 23:02:00 localhost kernel: [<ffffffff8cec6250>] ? insert_kthread_work+0x40/0x40
Apr 14 23:02:00 localhost kernel: [<ffffffff8d58dd1d>] ret_from_fork_nospec_begin+0x7/0x21
Apr 14 23:02:00 localhost kernel: [<ffffffff8cec6250>] ? insert_kthread_work+0x40/0x40

Environment

Red Hat Enterprise Linux
nvidia GPU
nvidia proprietary kernel modules (nvidia, nvidia_uvm, nvidia_modeset, nvidia_drm)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

page allocation failures noticed while using nvidia GPU related commands

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links