Kernel panic in vm_normal_page.

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • kernel-2.6.18-194.32.1.el5
  • kernel-2.6.18-308.8.2.el5
  • kernel-2.6.18-348.4.1.el5

Issue

  • Kernel panic with following stack traces.
Kernel BUG at mm/memory.c:425
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 1 
Modules linked in: nfs nfs_acl netconsole lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi vsock(U) vmci(U) vmxnet3(U) vmmemctl(U) pvscsi(U) acpiphp dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac lp ide_cd tpm_tis i2c_piix4 tpm floppy i2c_core cdrom parport_pc tpm_bios parport sg pcspkr serio_raw shpchp dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 7745, comm: java Tainted: G     ---- 2.6.18-308.8.2.el5 #1
RIP: 0010:[<ffffffff8000c8ca>]  [<ffffffff8000c8ca>] vm_normal_page+0x4e/0x83
RSP: 0018:ffff81011c185c20  EFLAGS: 00210202
RAX: 00003b7b7b7b7b7b RBX: ffff81000900ba00 RCX: 0000000000076f6f
RDX: 7b7b7b7b7b7b7b7b RSI: 00000003b7b7b7b7 RDI: ffff81013bb3b6b8
RBP: ffff8101040422a0 R08: ffff810104743901 R09: ffff810000017600
R10: 00000000420bbe18 R11: 0000000000000202 R12: 8000000125c0c067
R13: 00000000dd60e000 R14: ffff810125c03070 R15: ffff810138760140
FS:  0000000000000000(0000) GS:ffff8101047437c0(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000345e298f50 CR3: 00000001230c8000 CR4: 00000000000006a0
Process java (pid: 7745, threadinfo ffff81011c184000, task ffff8100120ab860)
Stack:  ffffffff80007ac1 0000000000000000 ffff81011c185d08 ffffffffffffffff
 0000000000000000 ffff81013bb3b6b8 ffff81011c185d10 0000000000030000
 0000000000000000 0000000100000000 ffff810138760140 00000000de587000
Call Trace:
 [<ffffffff80007ac1>] unmap_vmas+0x3ea/0x909
 [<ffffffff80039e46>] exit_mmap+0x87/0x104
 [<ffffffff8003bfd5>] mmput+0x30/0x82
 [<ffffffff800158b4>] do_exit+0x2e7/0x931
 [<ffffffff80048e4a>] cpuset_exit+0x0/0x88
 [<ffffffff8002b370>] get_signal_to_deliver+0x465/0x494
 [<ffffffff8005a94e>] do_notify_resume+0x9c/0x7af
 [<ffffffff80023a02>] __user_walk_fd+0x41/0x4c
 [<ffffffff80028973>] vfs_stat_fd+0x1b/0x4a
 [<ffffffff800bab72>] audit_syscall_exit+0x329/0x344
 [<ffffffff8005d32e>] int_signal+0x12/0x17


Code: 0f 0b 68 43 03 2c 80 c2 a9 01 eb fe 48 8b 10 48 6b c6 38 48 
RIP  [<ffffffff8000c8ca>] vm_normal_page+0x4e/0x83
 RSP <ffff81011c185c20>
  • Or:
Kernel BUG at mm/memory.c:425
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:80/0000:80:00.0/0000:81:00.0/host5/rport-5:0-2/target5:0:0/5:0:0:283/timeout
CPU 16 
Modules linked in: nfs nfs_acl mptctl mptbase autofs4 lockd sunrpc bonding dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom netxen_nic shpchp i7core_edac bnx2 tpm_tis sg hpilo 8021q edac_mc tpm pcspkr tpm_bios serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage lpfc scsi_transport_fc ata_piix libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 14036, comm: testrrDBNetwork Not tainted 2.6.18-348.4.1.el5 #1
RIP: 0010:[<ffffffff8000c8de>]  [<ffffffff8000c8de>] vm_normal_page+0x4e/0x83
RSP: 0018:ffff81062177fde0  EFLAGS: 00010206
RAX: 00003fffffffb045 RBX: ffff81109c120400 RCX: 000000000007ffff
RDX: ffffffffffffb045 RSI: 00000003fffffffb RDI: ffff810f10ff1b88
RBP: ffff810896cd8290 R08: 0000000000000000 R09: ffff811080001600
R10: 0000000000000000 R11: 0000000000000000 R12: 8000000eff29e045
R13: 0000000032e02000 R14: ffff810485928010 R15: ffff8106b0270500
FS:  0000000000000000(0000) GS:ffff81109c36ba40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000000000000
CR2: 00002b98235a94a8 CR3: 0000000000000000 CR4: 00000000000006a0
Process testrrDBNetwork (pid: 14036, threadinfo ffff81062177e000, task ffff81085ae39080)
Stack:  ffffffff80007ac1 0000000000000000 ffff81062177fec8 ffffffffffffffff
 0000000000000000 ffff810f10ff1b88 ffff81062177fed0 0000000000000000
 0000000000000000 0000000000000000 ffff8106b0270500 0000000000000000
Call Trace:
 [<ffffffff80007ac1>] unmap_vmas+0x3ea/0x909
 [<ffffffff8003a44c>] exit_mmap+0x87/0x104
 [<ffffffff8003c5d7>] mmput+0x30/0x82
 [<ffffffff800158df>] do_exit+0x2e7/0x931
 [<ffffffff80049487>] cpuset_exit+0x0/0x88
 [<ffffffff8005d29e>] tracesys+0xd5/0xdf


Code: 0f 0b 68 53 27 2c 80 c2 a9 01 eb fe 48 8b 10 48 6b c6 38 48
RIP  [<ffffffff8000c8de>] vm_normal_page+0x4e/0x83
 RSP <ffff81062177fde0>

Resolution

  • Schedule maintenance window and perform complete hardware health tests on the machine.
  • If this issue is recurrent, and no hardware piece is faulty then install kernel-debuginfo package and capture complete vmcore next time this issue strikes.

Root Cause

  • It appears that server was crashed while trying to get the PFN from an invalid PTE.
  • This sort of issue usually happens due to hardware (memory) faults.

Diagnostic Steps

  • Capture vmcore at the time of crash.

Stack Traces

crash > log
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/memory.c:425
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 1 
Modules linked in: nfs nfs_acl netconsole lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi vsock(U) vmci(U) vmxnet3(U) vmmemctl(U) pvscsi(U) acpiphp dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac lp ide_cd tpm_tis i2c_piix4 tpm floppy i2c_core cdrom parport_pc tpm_bios parport sg pcspkr serio_raw shpchp dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 7745, comm: java Tainted: G     ---- 2.6.18-308.8.2.el5 #1
RIP: 0010:[<ffffffff8000c8ca>]  [<ffffffff8000c8ca>] vm_normal_page+0x4e/0x83
RSP: 0018:ffff81011c185c20  EFLAGS: 00210202
RAX: 00003b7b7b7b7b7b RBX: ffff81000900ba00 RCX: 0000000000076f6f
RDX: 7b7b7b7b7b7b7b7b RSI: 00000003b7b7b7b7 RDI: ffff81013bb3b6b8
RBP: ffff8101040422a0 R08: ffff810104743901 R09: ffff810000017600
R10: 00000000420bbe18 R11: 0000000000000202 R12: 8000000125c0c067
R13: 00000000dd60e000 R14: ffff810125c03070 R15: ffff810138760140
FS:  0000000000000000(0000) GS:ffff8101047437c0(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000345e298f50 CR3: 00000001230c8000 CR4: 00000000000006a0
Process java (pid: 7745, threadinfo ffff81011c184000, task ffff8100120ab860)
Stack:  ffffffff80007ac1 0000000000000000 ffff81011c185d08 ffffffffffffffff
 0000000000000000 ffff81013bb3b6b8 ffff81011c185d10 0000000000030000
 0000000000000000 0000000100000000 ffff810138760140 00000000de587000
Call Trace:
 [<ffffffff80007ac1>] unmap_vmas+0x3ea/0x909
 [<ffffffff80039e46>] exit_mmap+0x87/0x104
 [<ffffffff8003bfd5>] mmput+0x30/0x82
 [<ffffffff800158b4>] do_exit+0x2e7/0x931
 [<ffffffff80048e4a>] cpuset_exit+0x0/0x88
 [<ffffffff8002b370>] get_signal_to_deliver+0x465/0x494
 [<ffffffff8005a94e>] do_notify_resume+0x9c/0x7af
 [<ffffffff80023a02>] __user_walk_fd+0x41/0x4c
 [<ffffffff80028973>] vfs_stat_fd+0x1b/0x4a
 [<ffffffff800bab72>] audit_syscall_exit+0x329/0x344
 [<ffffffff8005d32e>] int_signal+0x12/0x17


Code: 0f 0b 68 43 03 2c 80 c2 a9 01 eb fe 48 8b 10 48 6b c6 38 48 
RIP  [<ffffffff8000c8ca>] vm_normal_page+0x4e/0x83
 RSP <ffff81011c185c20>
crash> dis -rl ffffffff8000c8ca |tail -n 5
0xffffffff8000c8c3 <vm_normal_page+71>: je     0xffffffff8000c8ca <vm_normal_page+78>
/usr/src/debug/kernel-2.6.18/linux-2.6.18-308.8.2.el5.x86_64/mm/memory.c: 425
0xffffffff8000c8c5 <vm_normal_page+73>: testb  $0x1,(%rax)
0xffffffff8000c8c8 <vm_normal_page+76>: jne    0xffffffff8000c8d6 <vm_normal_page+90>
0xffffffff8000c8ca <vm_normal_page+78>: ud2    

Kernel Source Code

 0418 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 0419               pte_t pte)
 0420 {
 0421   unsigned long pfn;
 0422 
 0423   if (HAVE_PTE_SPECIAL) {
 0424       if (likely(!pte_special(pte))) {
 0425           BUG_ON(!pfn_valid(pte_pfn(pte)));  <<<-----
 0426           return pte_page(pte);
 0427       }

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.