Primary DC backup node crashes with "Too Many Open Files" when consumers are down
Issue
- We have a pair of live (master) and backup (slave) node in each Data Center(DC) using a replicated store. When we run a test with all consumers down then, backup node crashes with error
Too many open files
- The time it takes for the backup node to go down depends on the Throughput Per Second(TPS). If there is higher TPS, then the backup node goes down in minutes. If its low a TPS, then it takes a while to go down but eventually crashes.
Same behavior is subsequently observed in secondary DC
- The time it takes for the backup node to go down depends on the Throughput Per Second(TPS). If there is higher TPS, then the backup node goes down in minutes. If its low a TPS, then it takes a while to go down but eventually crashes.
Environment
- Red Hat AMQ 7.12.0, 7.12.1
- two DCs with shared-nothing
- replication between active and passive brokers
- asynchronous two-way mirroring
- all consumers down
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.