升级到 Red Hat Enterprise Linux 6.2 之后, 控制台上看到一些"page allocation failure. order:1, mode:0x20"相关的日志
Environment
- Red Hat Enterprise Linux 6 kernel 版本在 2.6.32-220.el6 之后的系统
Issue
升级到 Red Hat Enterprise Linux 6.2 之后,控制台上看到下述"page allocation failure. order:1, mode:0x20"相关的日志
Mar 18 14:39:17 hostname kernel: glusterfsd: page allocation failure. order:1, mode:0x20
Mar 18 14:39:17 hostname kernel: swapper: page allocation failure. order:1, mode:0x20
Mar 21 11:27:06 hostname kernel: swapper: page allocation failure. order:1, mode:0x20
Mar 21 18:25:45 hostname kernel: swapper: page allocation failure. order:1, mode:0x20
Mar 21 21:59:18 hostname kernel: swapper: page allocation failure. order:1, mode:0x20
-
由于页分配失败,Swapper 不能分配内存:
kernel: swapper: page allocation failure. order:1, mode:0x20 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-358.2.1.el6.x86_64 #1 kernel: Call Trace: kernel: <IRQ> [<ffffffff8112c207>] ? __alloc_pages_nodemask+0x757/0x8d0 kernel: [<ffffffff81166ab2>] ? kmem_getpages+0x62/0x170 kernel: [<ffffffff811676ca>] ? fallback_alloc+0x1ba/0x270 kernel: [<ffffffff8116711f>] ? cache_grow+0x2cf/0x320 kernel: [<ffffffff81167449>] ? ____cache_alloc_node+0x99/0x160 kernel: [<ffffffff811683cb>] ? kmem_cache_alloc+0x11b/0x190 kernel: [<ffffffff81439d58>] ? sk_prot_alloc+0x48/0x1c0 kernel: [<ffffffff8143ae32>] ? sk_clone+0x22/0x2e0 kernel: [<ffffffff81489d66>] ? inet_csk_clone+0x16/0xd0 kernel: [<ffffffff814a2c73>] ? tcp_create_openreq_child+0x23/0x450 kernel: [<ffffffff814a046d>] ? tcp_v4_syn_recv_sock+0x4d/0x310 kernel: [<ffffffff814a2a16>] ? tcp_check_req+0x226/0x460 kernel: [<ffffffff8149ff0b>] ? tcp_v4_do_rcv+0x35b/0x430 kernel: [<ffffffff81082034>] ? mod_timer+0x144/0x220 kernel: [<ffffffff814a171e>] ? tcp_v4_rcv+0x4fe/0x8d0 kernel: [<ffffffff814a171e>] ? tcp_v4_rcv+0x4fe/0x8d0 kernel: [<ffffffff8147f50d>] ? ip_local_deliver_finish+0xdd/0x2d0 kernel: [<ffffffff8147f798>] ? ip_local_deliver+0x98/0xa0 kernel: [<ffffffff8147ec5d>] ? ip_rcv_finish+0x12d/0x440 kernel: [<ffffffff8147f1e5>] ? ip_rcv+0x275/0x350 kernel: [<ffffffff814483bb>] ? __netif_receive_skb+0x4ab/0x750 kernel: [<ffffffff8144a798>] ? netif_receive_skb+0x58/0x60 kernel: [<ffffffffa008b975>] ? vmxnet3_rq_rx_complete+0x365/0x890 [vmxnet3] kernel: [<ffffffff8128d2b0>] ? swiotlb_map_page+0x0/0x100 kernel: [<ffffffffa008c0f3>] ? vmxnet3_poll_rx_only+0x43/0xc0 [vmxnet3] kernel: [<ffffffff8144cf63>] ? net_rx_action+0x103/0x2f0 kernel: [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0 kernel: [<ffffffff810e1720>] ? handle_IRQ_event+0x60/0x170 kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 kernel: [<ffffffff81076d95>] ? irq_exit+0x85/0x90 kernel: [<ffffffff81516f15>] ? do_IRQ+0x75/0xf0 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 kernel: <EOI> [<ffffffff8103b90b>] ? native_safe_halt+0xb/0x10 kernel: [<ffffffff8101495d>] ? default_idle+0x4d/0xb0 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 kernel: [<ffffffff81506d9c>] ? start_secondary+0x2ac/0x2ef
Resolution
解答
将内核更新到 kernel-2.6.32-358.el6或更高版本。下述 Root Cause 中会描述升级到这些版本的好处:
- 请注意,此更新不会完全消除页分配失败的可能性。
- 如果更新版本之后问题依然存在且 kernel 为 2.6.32-358.el6 或更新的版本,可采用下述解决方案。
解决方案
想要缓解问题或者阻止这种情况发生时,可使用下述的可调参数:
- 增大
vm.min_free_kbytes值,例如将其调为一个更高的值而不是单个分配请求的值。 - 若当前的
vm.zone_reclaim_mode值为0,将其调为1,这样系统就可以从cached内存中回收后备存储。
在/etc/sysctl.conf中可进行所有的这些设定,设完之后运行sysctl -p /etc/sysctl.conf来加载做的这些更改。
为获取关于这些可调变量的更多信息, 请安装 kernel-doc包并参考/usr/share/doc/kernel-doc-2.6.32/Documentation/sysctl/vm.txt.文件。
Root Cause
在 RHEL 6.4 之前的版本中, kswapd 不会释放不用的连续内存页,当系统中也没有其他东西来对内存碎片进行整理时,这会使得 GFP_ATOMIC 的分配请求重复性地失败。然而,在 RHEL 6.4 和其他更新版本的 RHEL 中, 如有需求,kswapd 将会压缩(整理)掉不用的内存。
即使如此,请注意内存分配仍可能会失败。例如,当 GFP_ATOMIC 的分配请求大规模地爆发时,kswapd 可能很难更进处理这些问题。然而,最终应该都可以成功地对内存进行分配。
也有一些其他造成页分配失败或引起其它问题的实例,请参考下述文章获取更多信息:
- Failed
GFP_ATOMICallocations by the network stack result in dropped packets, which will be received on a subsequent retransmit. - Network problems with page allocation failures using the mlx4_en driver
- Stale TCP connections with tg3 on Red Hat Enterprise Linux 6
- "page allocation failure" messages occurring on machine for mount.nfs process
- cmahostd, sosreport and cat beeing seen with page allocation failure error messages in RHEL 6.4
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
