Server crashed after " NVRM: GPU at 0000:15:00.0 has fallen off the bus " errors.
Issue
- Server crashed after following messages:
May 30 06:47:12 example kernel: NVRM: GPU at 0000:08:00: GPU-2486e72d-f934-7239-8160-d651ffad3d74
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 58, Edc 00000000
May 30 06:47:12 example kernel: NMI: IOCK error (debug interrupt?)
May 30 06:47:12 example kernel: CPU 0
..
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 58, Edc 00000003
May 30 06:47:12 example kernel: NVRM: Xid (0000:08:00): 45, Ch 00000001, engmsk 00000100
May 30 06:47:12 example kernel: NVRM: os_pci_init_handle: invalid context!
..
May 30 06:47:12 example NVRM: GPU at 0000:15:00.0 has fallen off the bus.
Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details
Environment
- Red Hat Enterprise Linux 5.6
- nvidia kernel module
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
