Several "page allocation failure. order:1, mode:0x20" messages are seen on the console after upgrade to Red Hat Enterprise Linux 6.2

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 6 kernels after 2.6.32-220.el6

Issue

The following warning messages are seen on the console after upgrading to update 2:

Mar 18 14:39:17 hostname kernel: glusterfsd: page allocation failure. order:1, mode:0x20
Mar 18 14:39:17 hostname kernel: swapper: page allocation failure. order:1, mode:0x20
Mar 21 11:27:06 hostname kernel: swapper: page allocation failure. order:1, mode:0x20
Mar 21 18:25:45 hostname kernel: swapper: page allocation failure. order:1, mode:0x20
Mar 21 21:59:18 hostname kernel: swapper: page allocation failure. order:1, mode:0x20
  • Swapper is unable to allocate memory due to page allocation failure:

    kernel: swapper: page allocation failure. order:1, mode:0x20
    kernel: Pid: 0, comm: swapper Not tainted 2.6.32-358.2.1.el6.x86_64 #1
    kernel: Call Trace:
    kernel: <IRQ>  [<ffffffff8112c207>] ? __alloc_pages_nodemask+0x757/0x8d0
    kernel: [<ffffffff81166ab2>] ? kmem_getpages+0x62/0x170
    kernel: [<ffffffff811676ca>] ? fallback_alloc+0x1ba/0x270
    kernel: [<ffffffff8116711f>] ? cache_grow+0x2cf/0x320
    kernel: [<ffffffff81167449>] ? ____cache_alloc_node+0x99/0x160
    kernel: [<ffffffff811683cb>] ? kmem_cache_alloc+0x11b/0x190
    kernel: [<ffffffff81439d58>] ? sk_prot_alloc+0x48/0x1c0
    kernel: [<ffffffff8143ae32>] ? sk_clone+0x22/0x2e0
    kernel: [<ffffffff81489d66>] ? inet_csk_clone+0x16/0xd0
    kernel: [<ffffffff814a2c73>] ? tcp_create_openreq_child+0x23/0x450
    kernel: [<ffffffff814a046d>] ? tcp_v4_syn_recv_sock+0x4d/0x310
    kernel: [<ffffffff814a2a16>] ? tcp_check_req+0x226/0x460
    kernel: [<ffffffff8149ff0b>] ? tcp_v4_do_rcv+0x35b/0x430
    kernel: [<ffffffff81082034>] ? mod_timer+0x144/0x220
    kernel: [<ffffffff814a171e>] ? tcp_v4_rcv+0x4fe/0x8d0
    kernel: [<ffffffff814a171e>] ? tcp_v4_rcv+0x4fe/0x8d0
    kernel: [<ffffffff8147f50d>] ? ip_local_deliver_finish+0xdd/0x2d0
    kernel: [<ffffffff8147f798>] ? ip_local_deliver+0x98/0xa0
    kernel: [<ffffffff8147ec5d>] ? ip_rcv_finish+0x12d/0x440
    kernel: [<ffffffff8147f1e5>] ? ip_rcv+0x275/0x350
    kernel: [<ffffffff814483bb>] ? __netif_receive_skb+0x4ab/0x750
    kernel: [<ffffffff8144a798>] ? netif_receive_skb+0x58/0x60
    kernel: [<ffffffffa008b975>] ? vmxnet3_rq_rx_complete+0x365/0x890 [vmxnet3]
    kernel: [<ffffffff8128d2b0>] ? swiotlb_map_page+0x0/0x100
    kernel: [<ffffffffa008c0f3>] ? vmxnet3_poll_rx_only+0x43/0xc0 [vmxnet3]
    kernel: [<ffffffff8144cf63>] ? net_rx_action+0x103/0x2f0
    kernel: [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0
    kernel: [<ffffffff810e1720>] ? handle_IRQ_event+0x60/0x170
    kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
    kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
    kernel: [<ffffffff81076d95>] ? irq_exit+0x85/0x90
    kernel: [<ffffffff81516f15>] ? do_IRQ+0x75/0xf0
    kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
    kernel: <EOI>  [<ffffffff8103b90b>] ? native_safe_halt+0xb/0x10
    kernel: [<ffffffff8101495d>] ? default_idle+0x4d/0xb0
    kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
    kernel: [<ffffffff81506d9c>] ? start_secondary+0x2ac/0x2ef
    

Resolution

Fix

Update to kernel-2.6.32-358.el6 or higher, which contains the enhancement described in the Root Cause section below.

  • Please note, this update (or newer) does not completely eliminate the possibility of the occurrence of the page allocation failure.
  • The below mentioned workaround also works in 2.6.32-358.el6 and newer if the issue still persists even after the update.
Workaround

The following tunables can be used in an attempt to alleviate or prevent the reported condition:

  • Increase vm.min_free_kbytes value, for example to a higher value than a single allocation request.
  • Change vm.zone_reclaim_mode to 1 if it's set to zero, so the system can reclaim back memory from cached memory.

Both settings can be set in /etc/sysctl.conf, and loaded using sysctl -p /etc/sysctl.conf.

For more information on these tunables, install the kernel-doc package and refer to file /usr/share/doc/kernel-doc-2.6.32/Documentation/sysctl/vm.txt.

Root Cause

Before RHEL 6.4, kswapd does not try to free contiguous pages. This can cause GFP_ATOMIC allocations requests to fail repeatedly, when nothing else in the system defragments memory. With RHEL 6.4 and newer, kswapd will compact (defragment) free memory, when required.

Please note that allocation failures can still happen. For example, when a larger burst of GFP_ATOMIC allocations occur which kswapd may struggle to keep up with. However, these allocations should eventually succeed.

There are also other more specific cases that can result in page allocation failures and cause additional issues. Please refer to the following articles for more information:

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments