jgroups RouterStubManager race condition that cause one or more gossip router never get reconnected
Issue
There was a network activity that was planned, post the activity RH-SSO pods were unable to reconnect.Only restart of all RH-SSO pods helped resolve the issue
The RHSSO nodes seems lose connection to one of the gossip router and never try to reconnect to the gossip router again.
When there is network issue that cause disconnect to 2 or more gossip router, there will be problem that it may fails to remove the affected RouterStub list (instance variable - 'stubs') due to race condition. This problem seems exists up to jgroups 4.2.22 (the latest 4.x version).
Environment
- Red Hat Single Sign-On
- 7.6.2
- JBoss Enterprise Application Platform
- 7.2.12
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.