OutOfMemory exception in ElasticSearch v1.5.2 in OpenShift Enterprise 3.2

Solution Verified - Updated -

Issue

Openshift Enterprise (OSE) version 3.2 comes with ElasticSearch (ES) version 1.5.2 as part of the ElasticSearch, Fluentd and Kibana (EFK) logging stacks.

The issue is started when one of the ES data node is down even though the Pod is still stood up.
After analyzing the log, it happens that the ES service is down/halted because of OutOfMemory exception.

Below is the result of "oc get pods":

# oc get pods --selector="component=es"
NAME                          READY     STATUS    RESTARTS   AGE
logging-es-70rd2lmh-2-nt73u   1/1       Running   0          5h
logging-es-ff8l7m7y-1-3xy73   1/1       Running   0          5h
logging-es-spa31woe-1-bbn8q   1/1       Running   0          5h

And this is the result of /_cat/nodes ES API:

sh-4.2$ ${curl_get} $ES_URL/_cat/nodes?v
host                        ip       heap.percent ram.percent load node.role master name           
logging-es-70rd2lmh-2-nt73u 10.1.X.X           42                  d         m      Cordelia Frost 
logging-es-spa31woe-1-bbn8q 10.1.X.X           38                  d         *      Mr. Wu

From the above /_cat/nodes API it only shows 2 nodes!
It should shows 3 nodes as reported by "oc get pods".
Dig logging-es-ff8l7m7y-1-3xy73 log and found OutOfMemory issue that could potentially crippled ES service in that node, that's why /_cat/nodes reported only 2 nodes are currently joining the cluster.
Here is the OutOfMemory issue found in the log:

2017-06-19 02:14:41,761][ERROR][cluster.action.shard     ] [Havok] unexpected failure during [shard-started ([.searchguard.logging-es-ff8l7m7y-1-o44b7][4], node[XXXX], [P], s[INITIALIZING]), reason [after recovery from gateway]]
java.lang.OutOfMemoryError: Java heap space

That crash caused the ES cluster health to become red.
Here is the result of /_cluster/health API:

sh-4.2$ ${curl_get} $ES_URL/_cluster/health?pretty=true
{
   "cluster_name" : "logging-es",
   "status" : "red",
   "timed_out" : false,
   "number_of_nodes" : 2,
   "number_of_data_nodes" : 2,
   "active_primary_shards" : 5652,
   "active_shards" : 6865,
   "relocating_shards" : 0,
   "initializing_shards" : 4,
   "unassigned_shards" : 4561,
   "number_of_pending_tasks" : 149
}

Environment

  • OpenShift Enterprise 3.2
  • ElasticSearch 1.5.2
  • Installed 3 nodes ES cluster which consist of 1 ES master and 2 ES data nodes.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In