Elasticsearch pods in CrashLoopBackOff after patching host OS

Solution Verified - Updated -

Environment

  • OpenShift Container Platform
    • 3.11

Issue

  • After an OS patch, logging-es pods went into a crash loop:
logging-es-data-master-asd1asd2asd3   1/2       CrashLoopBackOff   1159       4d
logging-es-data-master-asd1asd2asd3   1/2       CrashLoopBackOff   866        3d
logging-es-data-master-asd1asd2asd3   1/2       CrashLoopBackOff   885        3d
  • The start of the crash loop seems to line up with the nodes' reboot.
  • We properly excluded OpenShift components in the yum.conf, so we know these were not overwritten in the upgrade/patch:
exclude= atomic-openshift-tests  atomic-openshift-hyperkube  atomic-openshift-recycle  atomic-openshift-pod  atomic-openshift-node  atomic-openshift-master  atomic-openshift-clients-redistributable  atomic-openshift-clients  atomic-openshift  docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  docker*1.16*  docker*1.15*  docker*1.14*
  • Attempting to run the logging dump tool may freeze or fail to complete.
  • The following Elasticsearch error is present:
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Resolution

# sysctl -w vm.max_map_count=262144
  • Alternatively, you can reload the sysctl parameter back to default.
  • Ensure that elasticsearch.conf file exists in sysctl.d:
[root@node.example.com sysctl.d]# ll | grep elasticsearch
-rw-r--r--. 1 root root 24 Jul 11 10:32 99-elasticsearch.conf
  • If it doesn't exist, removing and reinstalling the EFK stack should recreate the file, assuming the vm.max_map_count is correct.
openshift_logging_install_logging=true
/etc/sysctl.d/99-elasticsearch.conf file created with, sysctl reloaded, value set to 262144

Root Cause

The 99-elasticsearch.conf file will be removed if there are changes made to the logging deployment, such as a patch or reinstall, if the vm.max_map_count is less than the default value. This has been addressed in a bug report and upstream, and while it is marked fixed it may still be present in existing deployments.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments