The kernel crashed at the time of the occurrence of the soft lockup. 160GB of memory was missing from one of the memory zones at that time (strange)
Issue
- The kernel crashes at the time of the occurrence of the soft lockup. 160GB of memory is missing from one of the memory zones at that time (strange)
Kernel panic - not syncing: softlockup: hung tasks
Pid: 54702, comm: java Tainted: G --L------------ 2.6.32-754.35.1.el6.x86_64 #1
Call Trace:
<IRQ> [<ffffffff8155a2af>] ? panic+0xa7/0x18b
[<ffffffff810f8b53>] ? watchdog_timer_fn+0x223/0x230
[<ffffffff810ff44a>] ? __rcu_process_callbacks+0x25a/0x350
[<ffffffff810f8930>] ? watchdog_timer_fn+0x0/0x230
[<ffffffff810b0a90>] ? __run_hrtimer+0x90/0x1e0
[<ffffffff810b0e3e>] ? hrtimer_interrupt+0xee/0x270
[<ffffffff81039f33>] ? local_apic_timer_interrupt+0x43/0x70
[<ffffffff8103f964>] ? native_apic_msr_eoi_write+0x14/0x20
[<ffffffff81568859>] ? smp_apic_timer_interrupt+0x49/0x60
[<ffffffff81567193>] ? apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff8155dfee>] ? _spin_lock+0x1e/0x30
[<ffffffff8116d1a2>] ? page_referenced+0xa2/0x360
[<ffffffff8114f2e1>] ? shrink_active_list+0x1e1/0x370
[<ffffffff811500e5>] ? shrink_mem_cgroup_zone+0x3d5/0x550
[<ffffffff81193eed>] ? mem_cgroup_iter+0xfd/0x280
[<ffffffff811502da>] ? shrink_zone+0x7a/0x190
[<ffffffff8115051a>] ? do_try_to_free_pages+0x12a/0x640
[<ffffffff81138f2f>] ? zone_watermark_ok+0x1f/0x30
[<ffffffff81150c05>] ? try_to_free_pages+0x95/0x130
[<ffffffff81144a7d>] ? __alloc_pages_nodemask+0x4cd/0x960
[<ffffffff81101559>] ? delayacct_end+0x89/0xa0
[<ffffffff811801fa>] ? alloc_pages_current+0xaa/0x110
[<ffffffff81134ee7>] ? __page_cache_alloc+0x87/0x90
[<ffffffff811348ce>] ? find_get_page+0x1e/0xa0
[<ffffffff81135e9d>] ? filemap_fault+0x1ad/0x520
[<ffffffff8115d42a>] ? __do_fault+0x5a/0x540
[<ffffffff81160e2a>] ? handle_pte_fault+0x9a/0xc80
[<ffffffff810b1205>] ? __hrtimer_start_range_ns+0x1a5/0x470
[<ffffffff810b0861>] ? lock_hrtimer_base+0x31/0x60
[<ffffffff810b154d>] ? hrtimer_try_to_cancel+0x3d/0xd0
[<ffffffff810b1602>] ? hrtimer_cancel+0x22/0x30
[<ffffffff81161d16>] ? handle_mm_fault+0x306/0x450
[<ffffffff81056031>] ? __do_page_fault+0x141/0x4d0
[<ffffffff81071b30>] ? default_wake_function+0x0/0x20
[<ffffffff810f23a3>] ? audit_filter_syscall+0x93/0xf0
[<ffffffff815622ce>] ? do_page_fault+0x3e/0xa0
[<ffffffff8155f265>] ? page_fault+0x25/0x30
At this time, the server is not running out of memory but has plenty of free memory with 100% swapping:
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 330734455 1261.7 GB ----
FREE 241362350 920.7 GB 72% of TOTAL MEM
USED 89372105 340.9 GB 27% of TOTAL MEM
SHARED 292580 1.1 GB 0% of TOTAL MEM
BUFFERS 189003 738.3 MB 0% of TOTAL MEM
CACHED 179140 699.8 MB 0% of TOTAL MEM
SLAB 142035 554.8 MB 0% of TOTAL MEM
TOTAL HUGE 0 0 ----
HUGE FREE 0 0 0% of TOTAL HUGE
TOTAL SWAP 1310719 5 GB ----
SWAP USED 1310719 5 GB 100% of TOTAL SWAP
SWAP FREE 0 0 0% of TOTAL SWAP
COMMIT LIMIT 166677946 635.8 GB ----
COMMITTED 166274771 634.3 GB 99% of TOTAL LIMIT
Although we tend to believe that the system has plenty of free memory, the memory zone, "Normal", on node 0 is running short of free memory pages at this time. The number of free pages is falling below the low watermark on the zone Normal on Node 0 as shown below:
crash> kmem -z | grep Normal -A3
NODE: 0 ZONE: 2 ADDR: ffff88000001f680 NAME: "Normal"
SIZE: 41156608 PRESENT: 40593920 MIN/LOW/HIGH: 4737/5921/7105
VM_STAT:
NR_FREE_PAGES: 5859 <<------------
--
NODE: 1 ZONE: 2 ADDR: ffff882840010dc0 NAME: "Normal"
SIZE: 83886080 PRESENT: 82739200 MIN/LOW/HIGH: 9655/12068/14482
VM_STAT:
NR_FREE_PAGES: 79918045
--
NODE: 2 ZONE: 2 ADDR: ffff887840010e00 NAME: "Normal"
SIZE: 83886080 PRESENT: 82739200 MIN/LOW/HIGH: 9655/12068/14482
VM_STAT:
NR_FREE_PAGES: 81425442
--
NODE: 3 ZONE: 2 ADDR: ffff88c840010e40 NAME: "Normal"
SIZE: 83886079 PRESENT: 82739199 MIN/LOW/HIGH: 9655/12068/14482
VM_STAT:
NR_FREE_PAGES: 81707104
As you can see, it's strange that approximately 160GB of memory, which is quite a large memory, is missing from the zone Normal on Node 0:
crash> kmem -z | grep Normal -A1 --no-group-separator
NODE: 0 ZONE: 2 ADDR: ffff88000001f680 NAME: "Normal"
SIZE: 41156608 PRESENT: 40593920 MIN/LOW/HIGH: 4737/5921/7105 <<------------
NODE: 1 ZONE: 2 ADDR: ffff882840010dc0 NAME: "Normal"
SIZE: 83886080 PRESENT: 82739200 MIN/LOW/HIGH: 9655/12068/14482
NODE: 2 ZONE: 2 ADDR: ffff887840010e00 NAME: "Normal"
SIZE: 83886080 PRESENT: 82739200 MIN/LOW/HIGH: 9655/12068/14482
NODE: 3 ZONE: 2 ADDR: ffff88c840010e40 NAME: "Normal"
SIZE: 83886079 PRESENT: 82739199 MIN/LOW/HIGH: 9655/12068/14482
crash> kmem -z | grep Normal -A1 --no-group-separator | grep -oE 'PRESENT: .{8}' | awk '{print $NF*4/1024.0/1024.0 " GiB"}'
154.854 GiB <<------------
315.625 GiB
315.625 GiB
315.625 GiB
Environment
- Red Hat Enterprise Linux 6.10.z kernel-2.6.32-754.35.1.el6.x86_64
- HPE ProLiant DL580 G7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.