Dom0 Xen kernel crashes in the netloop module on Red Hat Enterprise Linux 5.7 or later
Issue
The dom0 kernel of a Xen host may crash with the following (or similar) stack trace in the netloop module:
Unable to handle kernel paging request at ffff8800238a05c0
RIP: [<ffffffff8873c208>] :netloop:loopback_start_xmit+0x123/0x2ea
PGD 105d067 PUD 105e067 PMD 117b067 PTE 0
[...]
Call Trace:
<IRQ> [<ffffffff80424b6d>] dev_hard_start_xmit+0x1b7/0x28a
[<ffffffff80230b81>] dev_queue_xmit+0x31f/0x3ef
[<ffffffff80233073>] ip_output+0x29a/0x2dd
[<ffffffff80235783>] ip_queue_xmit+0x42c/0x486
[<ffffffff8021d32d>] __mod_timer+0xff/0x10e
[<ffffffff802d269e>] __kmalloc+0x8f/0x9f
[<ffffffff80222ad2>] tcp_transmit_skb+0x646/0x67e
[<ffffffff802341b7>] __tcp_push_pending_frames+0x75d/0x849
[<ffffffff8021c647>] tcp_rcv_established+0x818/0x8bd
[<ffffffff8023cc5a>] tcp_v4_do_rcv+0x2a/0x2fa
[<ffffffff8022c1ce>] local_bh_enable+0x9/0x9c
[<ffffffff8878e164>] :ip_conntrack:ip_confirm+0x33/0x39
[<ffffffff80227cbe>] tcp_v4_rcv+0xa23/0xa77
[<ffffffff8044076c>] ip_local_deliver_finish+0x0/0x1eb
[<ffffffff80258542>] nf_hook_slow+0x58/0xbc
[<ffffffff8044076c>] ip_local_deliver_finish+0x0/0x1eb
[<ffffffff8023597c>] ip_local_deliver+0x19f/0x265
[<ffffffff80236cdc>] ip_rcv+0x539/0x57c
[<ffffffff80221590>] netif_receive_skb+0x495/0x4c4
[<ffffffff802319a4>] process_backlog+0x9b/0x104
[<ffffffff8020d0a1>] net_rx_action+0xb4/0x1c6
[<ffffffff80212f06>] __do_softirq+0x8d/0x13b
[<ffffffff8025fda4>] call_softirq+0x1c/0x278
[<ffffffff8026db69>] do_softirq+0x31/0x90
[<ffffffff8025f8d6>] do_hypervisor_callback+0x1e/0x2c
<EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
[<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
[<ffffffff8026efa8>] raw_safe_halt+0x87/0xab
[<ffffffff8026c553>] xen_idle+0x38/0x4a
[<ffffffff8024ac15>] cpu_idle+0x97/0xba
[<ffffffff80758b11>] start_kernel+0x21f/0x224
[<ffffffff807581e5>] _sinittext+0x1e5/0x1eb
Environment
- Red Hat Enterprise Linux 5.7 or a more recent release in dom0.
- Paravirtualized block devices of a Xen guest are mounted via NFS in dom0 ("tap:aio" scheme).
- Both network traffic and vbd activity are considerable in said Xen guest (networked database server, for example).
- A low NFS timeout ("timeo" mount parameter) can exacerbate the problem.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.