The kernel crashed at the time of the occurrence of the soft lockup. 160GB of memory was missing from one of the memory zones at that time (strange)

Solution Verified - Updated -

Issue

  • The kernel crashes at the time of the occurrence of the soft lockup. 160GB of memory is missing from one of the memory zones at that time (strange)
Kernel panic - not syncing: softlockup: hung tasks
Pid: 54702, comm: java Tainted: G           --L------------    2.6.32-754.35.1.el6.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8155a2af>] ? panic+0xa7/0x18b
 [<ffffffff810f8b53>] ? watchdog_timer_fn+0x223/0x230
 [<ffffffff810ff44a>] ? __rcu_process_callbacks+0x25a/0x350
 [<ffffffff810f8930>] ? watchdog_timer_fn+0x0/0x230
 [<ffffffff810b0a90>] ? __run_hrtimer+0x90/0x1e0
 [<ffffffff810b0e3e>] ? hrtimer_interrupt+0xee/0x270
 [<ffffffff81039f33>] ? local_apic_timer_interrupt+0x43/0x70
 [<ffffffff8103f964>] ? native_apic_msr_eoi_write+0x14/0x20
 [<ffffffff81568859>] ? smp_apic_timer_interrupt+0x49/0x60
 [<ffffffff81567193>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8155dfee>] ? _spin_lock+0x1e/0x30
 [<ffffffff8116d1a2>] ? page_referenced+0xa2/0x360
 [<ffffffff8114f2e1>] ? shrink_active_list+0x1e1/0x370
 [<ffffffff811500e5>] ? shrink_mem_cgroup_zone+0x3d5/0x550
 [<ffffffff81193eed>] ? mem_cgroup_iter+0xfd/0x280
 [<ffffffff811502da>] ? shrink_zone+0x7a/0x190
 [<ffffffff8115051a>] ? do_try_to_free_pages+0x12a/0x640
 [<ffffffff81138f2f>] ? zone_watermark_ok+0x1f/0x30
 [<ffffffff81150c05>] ? try_to_free_pages+0x95/0x130
 [<ffffffff81144a7d>] ? __alloc_pages_nodemask+0x4cd/0x960
 [<ffffffff81101559>] ? delayacct_end+0x89/0xa0
 [<ffffffff811801fa>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff81134ee7>] ? __page_cache_alloc+0x87/0x90
 [<ffffffff811348ce>] ? find_get_page+0x1e/0xa0
 [<ffffffff81135e9d>] ? filemap_fault+0x1ad/0x520
 [<ffffffff8115d42a>] ? __do_fault+0x5a/0x540
 [<ffffffff81160e2a>] ? handle_pte_fault+0x9a/0xc80
 [<ffffffff810b1205>] ? __hrtimer_start_range_ns+0x1a5/0x470
 [<ffffffff810b0861>] ? lock_hrtimer_base+0x31/0x60
 [<ffffffff810b154d>] ? hrtimer_try_to_cancel+0x3d/0xd0
 [<ffffffff810b1602>] ? hrtimer_cancel+0x22/0x30
 [<ffffffff81161d16>] ? handle_mm_fault+0x306/0x450
 [<ffffffff81056031>] ? __do_page_fault+0x141/0x4d0
 [<ffffffff81071b30>] ? default_wake_function+0x0/0x20
 [<ffffffff810f23a3>] ? audit_filter_syscall+0x93/0xf0
 [<ffffffff815622ce>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8155f265>] ? page_fault+0x25/0x30
At this time, the server is not running out of memory but has plenty of free memory with 100% swapping:

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM  330734455    1261.7 GB         ----
         FREE  241362350     920.7 GB   72% of TOTAL MEM
         USED  89372105     340.9 GB   27% of TOTAL MEM
       SHARED   292580       1.1 GB    0% of TOTAL MEM
      BUFFERS   189003     738.3 MB    0% of TOTAL MEM
       CACHED   179140     699.8 MB    0% of TOTAL MEM
         SLAB   142035     554.8 MB    0% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP  1310719         5 GB         ----
    SWAP USED  1310719         5 GB  100% of TOTAL SWAP
    SWAP FREE        0            0    0% of TOTAL SWAP

 COMMIT LIMIT  166677946     635.8 GB         ----
    COMMITTED  166274771     634.3 GB   99% of TOTAL LIMIT

Although we tend to believe that the system has plenty of free memory, the memory zone, "Normal", on node 0 is running short of free memory pages at this time. The number of free pages is falling below the low watermark on the zone Normal on Node 0 as shown below:

crash> kmem -z | grep Normal -A3
NODE: 0  ZONE: 2  ADDR: ffff88000001f680  NAME: "Normal"
  SIZE: 41156608  PRESENT: 40593920  MIN/LOW/HIGH: 4737/5921/7105
  VM_STAT:
                NR_FREE_PAGES: 5859 <<------------
--
NODE: 1  ZONE: 2  ADDR: ffff882840010dc0  NAME: "Normal"
  SIZE: 83886080  PRESENT: 82739200  MIN/LOW/HIGH: 9655/12068/14482
  VM_STAT:
                NR_FREE_PAGES: 79918045
--
NODE: 2  ZONE: 2  ADDR: ffff887840010e00  NAME: "Normal"
  SIZE: 83886080  PRESENT: 82739200  MIN/LOW/HIGH: 9655/12068/14482
  VM_STAT:
                NR_FREE_PAGES: 81425442
--
NODE: 3  ZONE: 2  ADDR: ffff88c840010e40  NAME: "Normal"
  SIZE: 83886079  PRESENT: 82739199  MIN/LOW/HIGH: 9655/12068/14482
  VM_STAT:
                NR_FREE_PAGES: 81707104

As you can see, it's strange that approximately 160GB of memory, which is quite a large memory, is missing from the zone Normal on Node 0:

crash> kmem -z | grep Normal -A1 --no-group-separator
NODE: 0  ZONE: 2  ADDR: ffff88000001f680  NAME: "Normal"
  SIZE: 41156608  PRESENT: 40593920  MIN/LOW/HIGH: 4737/5921/7105 <<------------
NODE: 1  ZONE: 2  ADDR: ffff882840010dc0  NAME: "Normal"
  SIZE: 83886080  PRESENT: 82739200  MIN/LOW/HIGH: 9655/12068/14482
NODE: 2  ZONE: 2  ADDR: ffff887840010e00  NAME: "Normal"
  SIZE: 83886080  PRESENT: 82739200  MIN/LOW/HIGH: 9655/12068/14482
NODE: 3  ZONE: 2  ADDR: ffff88c840010e40  NAME: "Normal"
  SIZE: 83886079  PRESENT: 82739199  MIN/LOW/HIGH: 9655/12068/14482

crash> kmem -z | grep Normal -A1 --no-group-separator | grep -oE 'PRESENT: .{8}' | awk '{print $NF*4/1024.0/1024.0 " GiB"}'
154.854 GiB <<------------
315.625 GiB
315.625 GiB
315.625 GiB

Environment

  • Red Hat Enterprise Linux 6.10.z kernel-2.6.32-754.35.1.el6.x86_64
  • HPE ProLiant DL580 G7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content