Server hangs with a deadlock due to race condition between rotate_reclaimable_page() and __page_cache_release()

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • kernel < 2.6.18-308.el5

Issue

  • Server goes into an unresponsive state as the CPU waits on a spinlock.

Resolution

  • Download and apply the following errata release: RHSA-2012:0150

  • The errata release resolves a bug being tracked at the following Bugzilla: #699459

Root Cause

  • This deadlock is due to lock inversion between rotate_reclaimable_page() and __page_cache_release. An example of one of these deadlocks can be found below:
PID: 8515   TASK: ffff8105e5f60820  CPU: 3   COMMAND: "vxiod"
 #3 [ffff8105e56eda80] __write_lock_failed at ffffffff80062181
 #4 [ffff8105e56eda80] _write_lock_irqsave at ffffffff80064ba3        <--- wants the mapping->tree_lock
 #5 [ffff8105e56eda88] test_clear_page_writeback at ffffffff8001aea0
 #6 [ffff8105e56edab8] rotate_reclaimable_page at ffffffff80010a0e    <--- acquired the zone->lru_lock here
...

PID: 12435  TASK: ffff81030c507820  CPU: 13  COMMAND: "nfsd"
 #3 [ffff81030c4697f8] .text.lock.spinlock at ffffffff80064c08 (via _spin_lock_irqsave)
 #4 [ffff81030c4697f8] __page_cache_release at ffffffff800c9e13     <--- wants the zone->lru_lock here
 #5 [ffff81030c469818] find_lock_page at ffffffff80013b80           <--- acquired the mapping->tree_lock here

Diagnostic Steps

NMI Watchdog detected LOCKUP on CPU 1
CPU 1 
Modules linked in: nfs fscache vxodm(PFU) nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc dmpaa(PU) vxspec(PFU) vxio(PFU) vxdmp(PU) bonding ipv6 xfrm_nalgo crypto_api vxportal(PFU) fdd(PFU) vxfs(PU) dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport hpilo serio_raw shpchp bnx2x sg pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc scsi_transport_fc cciss aacraid ata_piix libata mptspi scsi_transport_spi mptsas mptscsih scsi_transport_sas sd_mod scsi_mod mptbase ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 8463, comm: vxiod Tainted: PF     2.6.18-194.26.1.el5 #1
RIP: 0010:[<ffffffff80064c08>]  [<ffffffff80064c08>] .text.lock.spinlock+0xe/0x30
RSP: 0018:ffff8105e565bbe8  EFLAGS: 00000086
RAX: 0000000000000246 RBX: ffff81031df7e4a8 RCX: ffff8105e47906c0
RDX: 0000000000000000 RSI: 0000000000000080 RDI: ffff8105c8f81660
RBP: 0000000000000001 R08: ffffffffffffffff R09: 0000000000000080
R10: 0000000000000010 R11: 0000000000000000 R12: ffff8105c7c4f968
R13: 0000000000000000 R14: ffff8105c8f858c8 R15: 00000000fa000000
FS:  00002aecfe38a240(0000) GS:ffff81031fc28ac0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000077143cb4 CR3: 00000005ba1ca000 CR4: 00000000000006e0
Process vxiod (pid: 8463, threadinfo ffff8105e565a000, task ffff8105e5fd60c0)
Stack:  ffffffff887e2e37 0000000000000202 000000000c7036b5 0000000000000001
 ffff8101c955aec0 0000000000000000 ffffffff8881b1b3 ffff81031fbcc000
 ffffffff881c4525 ffff81032a9d7a68 ffff81031fbcc380 ffff81031fbcc000
Call Trace:
 [<ffffffff887e2e37>] :vxio:vxvm_end_request+0x2c/0x62
 [<ffffffff8881b1b3>] :vxio:volkiodone+0x1e9/0x22e
 [<ffffffff881c4525>] :lpfc:lpfc_sli_ringtxcmpl_put+0x1d/0x75
 [<ffffffff801508e3>] __next_cpu+0x19/0x28
 [<ffffffff8008bbf0>] find_busiest_group+0x20d/0x621
 [<ffffffff8008ca47>] enqueue_task+0x41/0x56
 [<ffffffff8008da77>] move_tasks+0x22c/0x2ec
 [<ffffffff8883f154>] :vxio:volsiodone+0x421/0x469
 [<ffffffff88994ce8>] :vxio:vol_rv_write2_done+0x1c0/0x1d1
 [<ffffffff887e2a28>] :vxio:vxvm_getrbuf+0x3b/0x82
 [<ffffffff888190bc>] :vxio:volkcontext_process+0x75/0x136
 [<ffffffff88997baa>] :vxio:vol_rv_write2_start+0x3fd/0x40e
 [<ffffffff8881881b>] :vxio:voliod_iohandle+0x37/0x88
 [<ffffffff88818a57>] :vxio:voliod_loop+0x1eb/0x58f
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800e5b4a>] bdev_destroy_inode+0x0/0x1b
 [<ffffffff8881886c>] :vxio:voliod_loop+0x0/0x58f
 [<ffffffff8005dfa7>] child_rip+0x0/0x11
  • The system panicked due to an NMI because CPU 1 had interrupts disabled for too long. In this case it was trying to acquire the request_queue->queue_lock spinlock.

  • Of the 24 CPUs in this system, 15 are spinning on spinlocks. The one that holds the request_queue->queue_lock is:

PID: 8515   TASK: ffff8105e5f60820  CPU: 3   COMMAND: "vxiod"
 #0 [ffff81031fcf5f20] crash_nmi_callback at ffffffff8007af3e
 #1 [ffff81031fcf5f40] do_nmi at ffffffff8006588a
 #2 [ffff81031fcf5f50] nmi at ffffffff80064eef
    [exception RIP: __write_lock_failed+9]
    RIP: ffffffff80062181  RSP: ffff8105e56eda80  RFLAGS: 00000087
    RAX: 0000000000000086  RBX: ffff810107a103f0  RCX: ffff8105c7c4f968
    RDX: ffff810088bcf680  RSI: ffff810109aeabe0  RDI: ffff810256ab6ea0
    RBP: ffff810256ab6e88   R8: ffff810256ab6bf0   R9: ffff81010af4d8c0
    R10: 0000000000000030  R11: ffffffff80045000  R12: 0000000000000002
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000006000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #3 [ffff8105e56eda80] __write_lock_failed at ffffffff80062181
 #4 [ffff8105e56eda80] _write_lock_irqsave at ffffffff80064ba3        <--- wants the mapping->tree_lock
 #5 [ffff8105e56eda88] test_clear_page_writeback at ffffffff8001aea0
 #6 [ffff8105e56edab8] rotate_reclaimable_page at ffffffff80010a0e    <--- acquired the zone->lru_lock here
 #7 [ffff8105e56edad8] end_page_writeback at ffffffff8003a084
 #8 [ffff8105e56edae8] vx_async_pvn_pldone at ffffffff8860233e [vxfs]
 #9 [ffff8105e56edb18] vx_bc_biodone at ffffffff884cae64 [vxfs]
#10 [ffff8105e56edb58] vx_end_io_bp at ffffffff88606db3 [vxfs]
#11 [ffff8105e56edb78] vx_end_io at ffffffff88606e39 [vxfs]
#12 [ffff8105e56edb88] __end_that_request_first at ffffffff8002cd17
#13 [ffff8105e56edbe8] vxvm_end_request at ffffffff887e2e48 [vxio]    <--- acquired the request_queue->queue_lock here
#14 [ffff8105e56edc18] volkiodone at ffffffff8881b1b3 [vxio]
#15 [ffff8105e56edd98] vol_rv_write2_done at ffffffff88994ce8 [vxio]
#16 [ffff8105e56eddf8] volkcontext_process at ffffffff888190bc [vxio]
#17 [ffff8105e56ede38] vol_rv_write2_start at ffffffff88997baa [vxio]
#18 [ffff8105e56ede88] voliod_iohandle at ffffffff8881881b [vxio]
#19 [ffff8105e56edeb8] voliod_loop at ffffffff88818a57 [vxio]
#20 [ffff8105e56edf48] kernel_thread at ffffffff8005dfb1
  • This process is trying acquire the mapping->tree_lock spinlock which is held by this process:
PID: 12435  TASK: ffff81030c507820  CPU: 13  COMMAND: "nfsd"
 #0 [ffff81031ff52f20] crash_nmi_callback at ffffffff8007af3e
 #1 [ffff81031ff52f40] do_nmi at ffffffff8006588a
 #2 [ffff81031ff52f50] nmi at ffffffff80064eef
    [exception RIP: .text.lock.spinlock+14]
    RIP: ffffffff80064c08  RSP: ffff81030c4697f8  RFLAGS: 00000086
    RAX: 0000000000000002  RBX: ffff81010471cda8  RCX: 0000000000000034
    RDX: 0000000000000000  RSI: ffff81010471cda8  RDI: ffff81000002bf80
    RBP: ffff81000002b600   R8: ffff81000002b600   R9: ffff810009037e00
    R10: ffffffff804742d8  R11: 0000000000000202  R12: ffff810256ab6e88
    R13: 0000000000025ad6  R14: 0000000000025ad6  R15: ffff810256ab6e88
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #3 [ffff81030c4697f8] .text.lock.spinlock at ffffffff80064c08 (via _spin_lock_irqsave)
 #4 [ffff81030c4697f8] __page_cache_release at ffffffff800c9e13     <--- wants the zone->lru_lock here
 #5 [ffff81030c469818] find_lock_page at ffffffff80013b80           <--- acquired the mapping->tree_lock here
 #6 [ffff81030c469848] find_or_create_page at ffffffff80025c06
 #7 [ffff81030c469888] vx_segmap_getmap at ffffffff88601d62 [vxfs]
 #8 [ffff81030c469948] vx_write1_fast at ffffffff88623820 [vxfs]
 #9 [ffff81030c469a18] vx_write_common_fast at ffffffff8862485f [vxfs]
#10 [ffff81030c469a48] vx_write_common at ffffffff88624bf6 [vxfs]
#11 [ffff81030c469b38] vx_vop_write at ffffffff885d3adc [vxfs]
#12 [ffff81030c469b58] vx_writev at ffffffff885d79c0 [vxfs]
#13 [ffff81030c469c08] do_readv_writev at ffffffff800e079c
#14 [ffff81030c469ce8] nfsd_vfs_write at ffffffff88b9b711 [nfsd]
#15 [ffff81030c469d98] nfsd_write at ffffffff88b9c027 [nfsd]
#16 [ffff81030c469df8] nfsd3_proc_write at ffffffff88ba2b16 [nfsd]
#17 [ffff81030c469e48] nfsd_dispatch at ffffffff88b981db [nfsd]
#18 [ffff81030c469e78] svc_process at ffffffff88ab1651 [sunrpc]
#19 [ffff81030c469ee8] nfsd at ffffffff88b98746 [nfsd]
#20 [ffff81030c469f48] kernel_thread at ffffffff8005dfb1
  • This process is trying to acquire the zone->lru_lock which is held by CPU 3 creating a deadlock.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments