2TB system spinning in compaction_alloc(order=9)
Issue
- A box that keeps stalling repeatedly, with many threads showing the following trace:
PID: 70840 TASK: ffff88bf9e730ae0 CPU: 128 COMMAND: "java"
#0 [ffff8a00e1687e90] crash_nmi_callback at ffffffff8102d2c6
#1 [ffff8a00e1687ea0] notifier_call_chain at ffffffff81513455
#2 [ffff8a00e1687ee0] atomic_notifier_call_chain at ffffffff815134ba
#3 [ffff8a00e1687ef0] notify_die at ffffffff8109cc1e
#4 [ffff8a00e1687f20] do_nmi at ffffffff8151111b
#5 [ffff8a00e1687f50] nmi at ffffffff815109e0
[exception RIP: _spin_lock_irqsave+0x2f]
RIP: ffffffff8151013f RSP: ffff8881f7dcf7a8 RFLAGS: 00000093
RAX: 0000000000002259 RBX: ffff88000002a598 RCX: 0000000000002256
RDX: 0000000000000246 RSI: 0000000022592252 RDI: ffff88000002a598
RBP: ffff8881f7dcf7a8 R8: ffffea00326d5000 R9: 0000000000000000
R10: ffff88002c417f40 R11: 0000000000000000 R12: ffff8881f7dcf860
R13: 0000000000e68600 R14: ffffea00326d5000 R15: 000000000016cda9
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#6 [ffff8881f7dcf7a8] _spin_lock_irqsave at ffffffff8151013f
#7 [ffff8881f7dcf7b0] compact_checklock_irqsave at ffffffff811623e2
#8 [ffff8881f7dcf7e0] compaction_alloc at ffffffff81162712
#9 [ffff8881f7dcf8a0] migrate_pages at ffffffff8116d063
#10 [ffff8881f7dcf950] compact_zone at ffffffff811630a1
#11 [ffff8881f7dcfa10] compact_zone_order at ffffffff811636ac
#12 [ffff8881f7dcfac0] try_to_compact_pages at ffffffff811637e1
#13 [ffff8881f7dcfb30] __alloc_pages_direct_compact at ffffffff8112b9ca
#14 [ffff8881f7dcfba0] __alloc_pages_nodemask at ffffffff8112c02b
#15 [ffff8881f7dcfce0] alloc_pages_vma at ffffffff81160a5a
#16 [ffff8881f7dcfd30] do_huge_pmd_anonymous_page at ffffffff8117b5d5
#17 [ffff8881f7dcfd90] handle_mm_fault at ffffffff81144440
#18 [ffff8881f7dcfe00] __do_page_fault at ffffffff810474c9
#19 [ffff8881f7dcff20] do_page_fault at ffffffff8151339e
#20 [ffff8881f7dcff50] page_fault at ffffffff81510755
RIP: 00007ff73855efb1 RSP: 00007ff71e975870 RFLAGS: 00010203
RAX: 00007ff72fcf0000 RBX: 0000000000000011 RCX: 00007ff73855efad
RDX: 0000000000000006 RSI: 00007ff72fcf0000 RDI: 00007ff7389708f0
RBP: 00007ff71e975870 R8: 00007ff72f594d70 R9: 0000000000218e89
R10: 00007ff738a49c20 R11: 0000000000000006 R12: 00007ff72fcf0000
R13: 00007ff72f594d70 R14: 00000000415613e8 R15: 00000000415613e0
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
It appears that THP defragmentation is over-contending the LRU lock.
Environment
- Red Hat Enterprise Linux (RHEL) 6.4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.