2TB system spinning in compaction_alloc(order=9)
Issue
- A box that keeps stalling repeatedly, with many threads showing the following trace:
PID: 70840 TASK: ffff88bf9e730ae0 CPU: 128 COMMAND: "java"
#0 [ffff8a00e1687e90] crash_nmi_callback at ffffffff8102d2c6
#1 [ffff8a00e1687ea0] notifier_call_chain at ffffffff81513455
#2 [ffff8a00e1687ee0] atomic_notifier_call_chain at ffffffff815134ba
#3 [ffff8a00e1687ef0] notify_die at ffffffff8109cc1e
#4 [ffff8a00e1687f20] do_nmi at ffffffff8151111b
#5 [ffff8a00e1687f50] nmi at ffffffff815109e0
[exception RIP: _spin_lock_irqsave+0x2f]
RIP: ffffffff8151013f RSP: ffff8881f7dcf7a8 RFLAGS: 00000093
RAX: 0000000000002259 RBX: ffff88000002a598 RCX: 0000000000002256
RDX: 0000000000000246 RSI: 0000000022592252 RDI: ffff88000002a598
RBP: ffff8881f7dcf7a8 R8: ffffea00326d5000 R9: 0000000000000000
R10: ffff88002c417f40 R11: 0000000000000000 R12: ffff8881f7dcf860
R13: 0000000000e68600 R14: ffffea00326d5000 R15: 000000000016cda9
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#6 [ffff8881f7dcf7a8] _spin_lock_irqsave at ffffffff8151013f
#7 [ffff8881f7dcf7b0] compact_checklock_irqsave at ffffffff811623e2
#8 [ffff8881f7dcf7e0] compaction_alloc at ffffffff81162712
#9 [ffff8881f7dcf8a0] migrate_pages at ffffffff8116d063
#10 [ffff8881f7dcf950] compact_zone at ffffffff811630a1
#11 [ffff8881f7dcfa10] compact_zone_order at ffffffff811636ac
#12 [ffff8881f7dcfac0] try_to_compact_pages at ffffffff811637e1
#13 [ffff8881f7dcfb30] __alloc_pages_direct_compact at ffffffff8112b9ca
#14 [ffff8881f7dcfba0] __alloc_pages_nodemask at ffffffff8112c02b
#15 [ffff8881f7dcfce0] alloc_pages_vma at ffffffff81160a5a
#16 [ffff8881f7dcfd30] do_huge_pmd_anonymous_page at ffffffff8117b5d5
#17 [ffff8881f7dcfd90] handle_mm_fault at ffffffff81144440
#18 [ffff8881f7dcfe00] __do_page_fault at ffffffff810474c9
#19 [ffff8881f7dcff20] do_page_fault at ffffffff8151339e
#20 [ffff8881f7dcff50] page_fault at ffffffff81510755
RIP: 00007ff73855efb1 RSP: 00007ff71e975870 RFLAGS: 00010203
RAX: 00007ff72fcf0000 RBX: 0000000000000011 RCX: 00007ff73855efad
RDX: 0000000000000006 RSI: 00007ff72fcf0000 RDI: 00007ff7389708f0
RBP: 00007ff71e975870 R8: 00007ff72f594d70 R9: 0000000000218e89
R10: 00007ff738a49c20 R11: 0000000000000006 R12: 00007ff72fcf0000
R13: 00007ff72f594d70 R14: 00000000415613e8 R15: 00000000415613e0
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
It appears that THP defragmentation is over-contending the LRU lock.
Environment
- Red Hat Enterprise Linux (RHEL) 6.4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
