ES unassigned shards ALLOCATION_FAILED failing due to throttling in OCP 3.11
Issue
- EFK cluster health got RED due to disk space issues, which made some shards to be
UNASSIGNED. After increasing disk, the number of unassigned shards reduced from 36 to 9, and these remaining ones are failing to allocate with the following error:
{
"node_id" : "xx-xxx-xxx",
"node_name" : "logging-es-data-master-qf1iduip",
"transport_address" : "10.130.0.0:9300",
"node_decision" : "no",
"store" : {
"matching_sync_id" : true
},
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[project.myproject.c0e7bf0f-5071-11e9-be7c-005056b
f2324.2021.02.21][0], node[xx-xxx-xx], [P], s[STARTED], a[id=ekdeq_AWS9apsto_hLiMNw]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of outgoing shard recoveries [2] on the node [xx-xxx-xxx] which holds the primary, cluster setting [cluster.routing.allocation.node_concurrent_outgoing_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
Environment
Red Hat OpenShift Container Platform (RHOCP) 3.11
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.