A node is fenced after the logs show oom-killer killing a process in a RHEL 5, 6, or 7 High Availability cluster

Solution In Progress - Updated 2024-08-05T05:33:15+00:00 -

Issue

A cluster node got fenced, and I can see in the vmcore-dmesg.txt file in /var/crash that there was an oom-killer right before it got fenced. However it didn't kill any cluster-related processes, so why did it stop responding and get fenced?

<4>sh invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0
<6>sh cpuset=/ mems_allowed=0-1
<4>Pid: 21255, comm: sh Not tainted 2.6.32-358.6.2.el6.x86_64 #1
<4>Call Trace:
<4> [<ffffffff810cb5f1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
<4> [<ffffffff8111cdf0>] ? dump_header+0x90/0x1b0
<4> [<ffffffff8111cf5e>] ? check_panic_on_oom+0x4e/0x80
<4> [<ffffffff8111d64b>] ? out_of_memory+0x1bb/0x3c0
<4> [<ffffffff8112b8d0>] ? drain_local_pages+0x0/0x20
<4> [<ffffffff8112c35c>] ? __alloc_pages_nodemask+0x8ac/0x8d0
<4> [<ffffffff8116095a>] ? alloc_pages_current+0xaa/0x110
<4> [<ffffffff81129d3e>] ? __get_free_pages+0xe/0x50
<4> [<ffffffff8106bef4>] ? copy_process+0xe4/0x1450
<4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8106d2f4>] ? do_fork+0x94/0x460
<4> [<ffffffff81009598>] ? sys_clone+0x28/0x30
<4> [<ffffffff8100b393>] ? stub_clone+0x13/0x20
<4> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

Node got fenced after "panic_on_oom" message is seen

<0>Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
<0>
<4>Pid: 21255, comm: sh Not tainted 2.6.32-358.6.2.el6.x86_64 #1
<4>Call Trace:
<4> [<ffffffff8150d478>] ? panic+0xa7/0x16f
<4> [<ffffffff8111cef1>] ? dump_header+0x191/0x1b0
<4> [<ffffffff8111cf8c>] ? check_panic_on_oom+0x7c/0x80
<4> [<ffffffff8111d64b>] ? out_of_memory+0x1bb/0x3c0
<4> [<ffffffff8112b8d0>] ? drain_local_pages+0x0/0x20
<4> [<ffffffff8112c35c>] ? __alloc_pages_nodemask+0x8ac/0x8d0
<4> [<ffffffff8116095a>] ? alloc_pages_current+0xaa/0x110
<4> [<ffffffff81129d3e>] ? __get_free_pages+0xe/0x50
<4> [<ffffffff8106bef4>] ? copy_process+0xe4/0x1450
<4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8106d2f4>] ? do_fork+0x94/0x460
<4> [<ffffffff81009598>] ? sys_clone+0x28/0x30
<4> [<ffffffff8100b393>] ? stub_clone+0x13/0x20
<4> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b

A node was powered off by the cluster via fencing, and in the sar data we see memory consumption climbing leading up to the event, until eventually there was an oom-kill

Environment

Red Hat Enterprise Linux (RHEL) 5, 6, or 7 with the High Availability Add On
sysctl parameter vm.panic_on_oom is set to 1
- See Diagnostic Steps below for steps to check this

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

A node is fenced after the logs show oom-killer killing a process in a RHEL 5, 6, or 7 High Availability cluster

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links