system hangs under load due to blocked tasks and high cpu load

Solution In Progress - Updated -

Issue

Upon starting the application on a system with storage foundation installed the server may hang or show high load average and CPU activity. Many of the blocked processes are ps, pgrep, and other simple commands. The blocked tasks stacks indicate most are sleeping on __down_read or __down_write i.e. access to a semaphore. Example stack traces below. From the crash analysis we see that all blocked tasks (+8000) are sleeping on the mm_struct.mmap_sem for this task's thread group. This is critical as it differentiates other issues with similar symptoms from this.

crash> bt ffff812fff92a0c0
PID: 17770  TASK: ffff812fff92a0c0  CPU: 16  COMMAND: "pmdtm"
 #0 [ffff812fa8de7e38] schedule at ffffffff80062fa0
 #1 [ffff812fa8de7f10] __down_write_nested at ffffffff800645f3
 #2 [ffff812fa8de7f50] sys_brk at ffffffff80017488
 #3 [ffff812fa8de7f80] tracesys at ffffffff8005d29e (via system_call)
    RIP: 00000033038cd6da  RSP: 00002ace183e4c38  RFLAGS: 00000287
    RAX: ffffffffffffffda  RBX: ffffffff8005d29e  RCX: ffffffffffffffff
    RDX: 0000000000021720  RSI: 000000000eb58000  RDI: 000000000eb58000
    RBP: 0000000000000001   R8: 0000000000000004   R9: 0000000000000003
    R10: 005200450056005f  R11: 0000000000000287  R12: 0000003303b549e0
    R13: 0000000000022000  R14: 0000000000022000  R15: 000000000eb36000

    ORIG_RAX: 000000000000000c  CS: 0033  SS: 002b
crash> bt ffff816024ecb080
PID: 23013  TASK: ffff816024ecb080  CPU: 10  COMMAND: "perfd"
 #0 [ffff81601999bcc8] schedule at ffffffff80062fa0
 #1 [ffff81601999bda0] __down_read at ffffffff8006468c
 #2 [ffff81601999bde0] __access_process_vm at ffffffff800c74e0
 #3 [ffff81601999be50] access_process_vm at ffffffff800c765b
 #4 [ffff81601999be90] proc_pid_cmdline at ffffffff8010d970
 #5 [ffff81601999bed0] proc_info_read at ffffffff8010e0b6
 #6 [ffff81601999bf10] vfs_read at ffffffff8000b732
 #7 [ffff81601999bf40] sys_read at ffffffff80011db5
 #8 [ffff81601999bf80] tracesys at ffffffff8005d29e (via system_call)
    RIP: 00000033038c71fb  RSP: 00007fffab53af40  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: ffffffff8005d29e  RCX: ffffffffffffffff
    RDX: 0000000000000fff  RSI: 00007fffab53af90  RDI: 0000000000000008
    RBP: 0000000000000000   R8: 00002ba578aadf80   R9: 000000330391ad60
    R10: 0000000000000008  R11: 0000000000000202  R12: 0000000000000008
    R13: 0000000000000000  R14: 00007fffab53bf90  R15: 000000017898101c
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

Environment

o Red Hat Enterprise Linux ( All Version )
o Veritas Storage Foundation

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content