Server crashed after " NVRM: GPU at 0000:15:00.0 has fallen off the bus " errors.
Issue
- Server crashed after following messages:
May 30 06:47:12 example kernel: NVRM: GPU at 0000:08:00: GPU-2486e72d-f934-7239-8160-d651ffad3d74
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 58, Edc 00000000
May 30 06:47:12 example kernel: NMI: IOCK error (debug interrupt?)
May 30 06:47:12 example kernel: CPU 0
..
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 58, Edc 00000003
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 45, Ch 00000001, engmsk 00000100
May 30 06:47:12 example kernel: NVRM: os_pci_init_handle: invalid context!
..
May 30 06:47:12 example NVRM: GPU at 0000:15:00.0 has fallen off the bus.
Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details
Environment
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 5.6
- nvidia kernel module
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.