Server crashed after " NVRM: GPU at 0000:15:00.0 has fallen off the bus " errors.

Solution Unverified - Updated -

Issue

  • Server crashed after following messages:
May 30 06:47:12 example kernel: NVRM: GPU at 0000:08:00: GPU-2486e72d-f934-7239-8160-d651ffad3d74
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 58, Edc 00000000
May 30 06:47:12 example kernel: NMI: IOCK error (debug interrupt?)
May 30 06:47:12 example kernel: CPU 0 
..
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 58, Edc 00000003
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 45, Ch 00000001, engmsk 00000100
May 30 06:47:12 example kernel: NVRM: os_pci_init_handle: invalid context!
..
May 30 06:47:12 example NVRM: GPU at 0000:15:00.0 has fallen off the bus.                                
Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details

Environment

  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 5.6
  • nvidia kernel module

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content