2TB system spinning in compaction_alloc(order=9)

Solution Unverified - Updated -

Issue

  • A box that keeps stalling repeatedly, with many threads showing the following trace:
PID: 70840  TASK: ffff88bf9e730ae0  CPU: 128  COMMAND: "java"
 #0 [ffff8a00e1687e90] crash_nmi_callback at ffffffff8102d2c6
 #1 [ffff8a00e1687ea0] notifier_call_chain at ffffffff81513455
 #2 [ffff8a00e1687ee0] atomic_notifier_call_chain at ffffffff815134ba
 #3 [ffff8a00e1687ef0] notify_die at ffffffff8109cc1e
 #4 [ffff8a00e1687f20] do_nmi at ffffffff8151111b
 #5 [ffff8a00e1687f50] nmi at ffffffff815109e0
    [exception RIP: _spin_lock_irqsave+0x2f]
    RIP: ffffffff8151013f  RSP: ffff8881f7dcf7a8  RFLAGS: 00000093
    RAX: 0000000000002259  RBX: ffff88000002a598  RCX: 0000000000002256
    RDX: 0000000000000246  RSI: 0000000022592252  RDI: ffff88000002a598
    RBP: ffff8881f7dcf7a8   R8: ffffea00326d5000   R9: 0000000000000000
    R10: ffff88002c417f40  R11: 0000000000000000  R12: ffff8881f7dcf860
    R13: 0000000000e68600  R14: ffffea00326d5000  R15: 000000000016cda9
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff8881f7dcf7a8] _spin_lock_irqsave at ffffffff8151013f
 #7 [ffff8881f7dcf7b0] compact_checklock_irqsave at ffffffff811623e2
 #8 [ffff8881f7dcf7e0] compaction_alloc at ffffffff81162712
 #9 [ffff8881f7dcf8a0] migrate_pages at ffffffff8116d063
#10 [ffff8881f7dcf950] compact_zone at ffffffff811630a1
#11 [ffff8881f7dcfa10] compact_zone_order at ffffffff811636ac
#12 [ffff8881f7dcfac0] try_to_compact_pages at ffffffff811637e1
#13 [ffff8881f7dcfb30] __alloc_pages_direct_compact at ffffffff8112b9ca
#14 [ffff8881f7dcfba0] __alloc_pages_nodemask at ffffffff8112c02b
#15 [ffff8881f7dcfce0] alloc_pages_vma at ffffffff81160a5a
#16 [ffff8881f7dcfd30] do_huge_pmd_anonymous_page at ffffffff8117b5d5
#17 [ffff8881f7dcfd90] handle_mm_fault at ffffffff81144440
#18 [ffff8881f7dcfe00] __do_page_fault at ffffffff810474c9
#19 [ffff8881f7dcff20] do_page_fault at ffffffff8151339e
#20 [ffff8881f7dcff50] page_fault at ffffffff81510755
    RIP: 00007ff73855efb1  RSP: 00007ff71e975870  RFLAGS: 00010203
    RAX: 00007ff72fcf0000  RBX: 0000000000000011  RCX: 00007ff73855efad
    RDX: 0000000000000006  RSI: 00007ff72fcf0000  RDI: 00007ff7389708f0
    RBP: 00007ff71e975870   R8: 00007ff72f594d70   R9: 0000000000218e89
    R10: 00007ff738a49c20  R11: 0000000000000006  R12: 00007ff72fcf0000
    R13: 00007ff72f594d70  R14: 00000000415613e8  R15: 00000000415613e0
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

It appears that THP defragmentation is over-contending the LRU lock.

Environment

  • Red Hat Enterprise Linux (RHEL) 6.4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.