Elasticsearch pods in CrashLoopBackOff after patching host OS

Environment

OpenShift Container Platform
- 3.11

Issue

After an OS patch, logging-es pods went into a crash loop:

logging-es-data-master-asd1asd2asd3   1/2       CrashLoopBackOff   1159       4d
logging-es-data-master-asd1asd2asd3   1/2       CrashLoopBackOff   866        3d
logging-es-data-master-asd1asd2asd3   1/2       CrashLoopBackOff   885        3d

The start of the crash loop seems to line up with the nodes' reboot.
We properly excluded OpenShift components in the yum.conf, so we know these were not overwritten in the upgrade/patch:

exclude= atomic-openshift-tests  atomic-openshift-hyperkube  atomic-openshift-recycle  atomic-openshift-pod  atomic-openshift-node  atomic-openshift-master  atomic-openshift-clients-redistributable  atomic-openshift-clients  atomic-openshift  docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  docker*1.16*  docker*1.15*  docker*1.14*

Attempting to run the logging dump tool may freeze or fail to complete.
The following Elasticsearch error is present:

max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Resolution

First, ensure that the max_map_count variable is equal to 262144:

# sysctl -w vm.max_map_count=262144

Alternatively, you can reload the sysctl parameter back to default.
Ensure that elasticsearch.conf file exists in sysctl.d:

[root@node.example.com sysctl.d]# ll | grep elasticsearch
-rw-r--r--. 1 root root 24 Jul 11 10:32 99-elasticsearch.conf

If it doesn't exist, removing and reinstalling the EFK stack should recreate the file, assuming the vm.max_map_count is correct.

openshift_logging_install_logging=true
/etc/sysctl.d/99-elasticsearch.conf file created with, sysctl reloaded, value set to 262144

Root Cause

The 99-elasticsearch.conf file will be removed if there are changes made to the logging deployment, such as a patch or reinstall, if the vm.max_map_count is less than the default value. This has been addressed in a bug report and upstream, and while it is marked fixed it may still be present in existing deployments.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Elasticsearch pods in CrashLoopBackOff after patching host OS

Environment

Issue

Resolution

Root Cause

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links