System page fault in queue_process() and recursive lock in netpoll_queue()

Solution In Progress - Updated -

Issue

  • The system encountered a null pointer in the queue_process() routine. This caused a page fault and the ensuing code triggered a call to netconsole to report the error. As the queue_lock was already held when the page fault occurred and it attempted to acquire the lock again in netpoll_queue(). This lead to a recursive attempt to take the lock. As it's a spin lock the process entered a loop that it could never return from resulting in the NMI watchdog issuing an NMI that caused an outage.

Environment

  • Red Hat enterprise 5.8
    • 2.6.18-308.11.1.el5
  • Netconsole

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content