Kernel panic in unm_nic_poll_controller() function of unsigned module (nx_nic)
Environment
- Red Hat Enterprise Linux 5
- kernel-2.6.18-194.17.4.el5
- Unsigned (U) module: nx_nic
Issue
- Kernel panic with following call traces:
nx_nic[eth0]: Device is DOWN. Fail count[8]
nx_nic[eth0]: Firmware hang detected. Severity code=0 Peg number=0 Error code=0 Return address=0
nx_nic: Flash Version: Firmware[4.0.579], BIOS[2.1.0]
nx_nic: No memory on card. Load Cut through.
Unable to handle kernel NULL pointer dereference<1>Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
[<ffffffff8822bd77>] :nx_nic:unm_nic_poll_controller+0x29/0x42
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /class/firmware/0000:04:00.0/loading
CPU 5
Modules linked in: mptctl mptbase netconsole nls_utf8 cifs nfs fscache nfs_acl lockd sunrpc bnx2i(U) libiscsi2 cnic(U) uio scsi_transport_iscsi2 scsi_transport_iscsi loop dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom sg lpfc nx_nic(U) hpilo serio_raw scsi_transport_fc shpchp pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2821, comm: firmware_helper Tainted: G 2.6.18-194.17.4.el5 #1
RIP: 0010:[<ffffffff8822bd77>] [<ffffffff8822bd77>] :nx_nic:unm_nic_poll_controller+0x29/0x42
RSP: 0018:ffff810838fdf8f8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff81083fe50000 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000016 RDI: 000000000000003b
RBP: ffffffff88583ae0 R08: 00000000a479de10 R09: 0000022000000000
R10: 0000000000000046 R11: 0000000000000006 R12: ffff81083fe50500
R13: ffff81069c065580 R14: 0000000000000033 R15: ffff81015fcfc7a0
FS: 00002b73e10156e0(0000) GS:ffff81105c20b1c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 00000017b9dea000 CR4: 00000000000006e0
Process firmware_helper (pid: 2821, threadinfo ffff810838fde000, task ffff81015fcfc7a0)
Stack: 0000000000000000 ffff81083fe50000 ffffffff88583ae0 ffffffff80239fe9
000000000000002c ffffffff8023a458 0000000000000000 0000000000000030
0000000000000000 ffff81083fe50000 ffffffff88583ae0 ffff81105c3f1280
Call Trace:
[<ffffffff80239fe9>] netpoll_poll+0x3a/0x365
[<ffffffff8023a458>] find_skb+0x41/0xec
[<ffffffff8023a3fd>] netpoll_send_skb+0xe9/0x103
[<ffffffff885830d0>] :netconsole:write_msg+0x40/0x58
[<ffffffff80091c0f>] __call_console_drivers+0x5b/0x69
[<ffffffff800172a9>] release_console_sem+0x189/0x20e
[<ffffffff800923a8>] vprintk+0x2b2/0x317
[<ffffffff8009245f>] printk+0x52/0xbd
[<ffffffff800a687e>] search_module_extables+0x81/0x8d
[<ffffffff8006505a>] oops_begin+0x5e/0x65
[<ffffffff80066d87>] do_page_fault+0x6fd/0x874
[<ffffffff80076521>] do_flush_tlb_all+0x0/0x6a
[<ffffffff8005dde9>] error_exit+0x0/0x84
[<ffffffff80076521>] do_flush_tlb_all+0x0/0x6a
Code: 48 8b 70 18 41 ff 94 24 e0 3a 00 00 8b 7b 38 41 58 5b 41 5c
RIP [<ffffffff8822bd77>] :nx_nic:unm_nic_poll_controller+0x29/0x42
RSP <ffff810838fdf8f8>
Resolution
- Since Red Hat does not have the source code of unsigned (U) kernel module (nx_nic), engage vendor of "nx_nic" module and take their opinion on it.
- Alternatively, switch to the netxen_nic module, which is shipped by Red Hat and update the NIC firmware to latest version.
Root Cause
- Kernel panic occurred while dereferencing a null pointer in a register in unm_nic_poll_controller() function of unsigned (U) kernel module "nx_nic".
Diagnostic Steps
System Information:
crash> sys| grep -e RELEASE -e MEMORY -e PANIC
RELEASE: 2.6.18-194.17.4.el5
MEMORY: 126.2 GB
PANIC: "Oops: 0000 [1] SMP " (check log for details)
Kernel Ring Buffer:
crash> log
nx_nic[eth0]: Device is DOWN. Fail count[8]
nx_nic[eth0]: Firmware hang detected. Severity code=0 Peg number=0 Error code=0 Return address=0
nx_nic: Flash Version: Firmware[4.0.579], BIOS[2.1.0]
nx_nic: No memory on card. Load Cut through.
Unable to handle kernel NULL pointer dereference<1>Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
[<ffffffff8822bd77>] :nx_nic:unm_nic_poll_controller+0x29/0x42
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /class/firmware/0000:04:00.0/loading
CPU 5
Modules linked in: mptctl mptbase netconsole nls_utf8 cifs nfs fscache nfs_acl lockd sunrpc bnx2i(U) libiscsi2 cnic(U) uio scsi_transport_iscsi2 scsi_transport_iscsi loop dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom sg lpfc nx_nic(U) hpilo serio_raw scsi_transport_fc shpchp pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2821, comm: firmware_helper Tainted: G 2.6.18-194.17.4.el5 #1
RIP: 0010:[<ffffffff8822bd77>] [<ffffffff8822bd77>] :nx_nic:unm_nic_poll_controller+0x29/0x42
RSP: 0018:ffff810838fdf8f8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff81083fe50000 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000016 RDI: 000000000000003b
RBP: ffffffff88583ae0 R08: 00000000a479de10 R09: 0000022000000000
R10: 0000000000000046 R11: 0000000000000006 R12: ffff81083fe50500
R13: ffff81069c065580 R14: 0000000000000033 R15: ffff81015fcfc7a0
FS: 00002b73e10156e0(0000) GS:ffff81105c20b1c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 00000017b9dea000 CR4: 00000000000006e0
Process firmware_helper (pid: 2821, threadinfo ffff810838fde000, task ffff81015fcfc7a0)
Stack: 0000000000000000 ffff81083fe50000 ffffffff88583ae0 ffffffff80239fe9
000000000000002c ffffffff8023a458 0000000000000000 0000000000000030
0000000000000000 ffff81083fe50000 ffffffff88583ae0 ffff81105c3f1280
Call Trace:
[<ffffffff80239fe9>] netpoll_poll+0x3a/0x365
[<ffffffff8023a458>] find_skb+0x41/0xec
[<ffffffff8023a3fd>] netpoll_send_skb+0xe9/0x103
[<ffffffff885830d0>] :netconsole:write_msg+0x40/0x58
[<ffffffff80091c0f>] __call_console_drivers+0x5b/0x69
[<ffffffff800172a9>] release_console_sem+0x189/0x20e
[<ffffffff800923a8>] vprintk+0x2b2/0x317
[<ffffffff8009245f>] printk+0x52/0xbd
[<ffffffff800a687e>] search_module_extables+0x81/0x8d
[<ffffffff8006505a>] oops_begin+0x5e/0x65
[<ffffffff80066d87>] do_page_fault+0x6fd/0x874
[<ffffffff80076521>] do_flush_tlb_all+0x0/0x6a
[<ffffffff8005dde9>] error_exit+0x0/0x84
[<ffffffff80076521>] do_flush_tlb_all+0x0/0x6a
Code: 48 8b 70 18 41 ff 94 24 e0 3a 00 00 8b 7b 38 41 58 5b 41 5c
RIP [<ffffffff8822bd77>] :nx_nic:unm_nic_poll_controller+0x29/0x42
RSP <ffff810838fdf8f8>
- Details of unsigned (U) module:
nx_nic
crash> mod |grep -e NAME -e nx_nic
MODULE NAME SIZE OBJECT FILE
ffffffff8825cd80 nx_nic 254444 (not loaded) [CONFIG_KALLSYMS]
crash> px ((struct module *)0xffffffff8825cd80)->name
$1 = "nx_nic\000\000\000\000\000\000\000\000\000\000\000"
crash> px ((struct module *)0xffffffff8825cd80)->version
$2 = 0xffff81083e574ac0 "4.0.534"
crash> px ((struct module *)0xffffffff8825cd80)->srcversion
$3 = 0xffff81083e574b40 "F8D1CACF756C0AB7CC016A0"
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
