Distribution mode cache for the web subsystem breaks session stickiness
Environment
- JBoss Enterprise Application Platform (EAP) 6.4.10 and earlier
- JBoss Enterprise Application Platform (EAP) 7.x
Issue
- Distribution mode cache for the web subsystem breaks session stickiness. When using dist mode, if you remove a worker from the balancer (e.g. disabling worker by using balancer-manager in mod_proxy, or disabling worker by using jkstatus in mod_jk or suspending the instance), but it is still part of the cluster other nodes will continually send any requests stuck to the original node back to the original node without updating the jvmRoute. This causes sessions to 'spray' all over the cluster without sticking to a new node, which can cause session locking issues or session state conflicts.
Resolution
EAP 6
This saw a change released in 6.4.11. By setting -Djvmroute.dist.local.node=true on 6.4.11 and later, dist mode will then sticky a request to the current local node instead of try to continually sticky requests to a potentially unreachable dist session owner and causing session spraying.
Otherwise, an immediate workaround is to us the replicated cache mode for the web subsystem instead of dist mode.
EAP 7
Since EAP 7.3.0+ with the new feature implemented through WFLY-6944/JBEAP-6078/JBEAP-14797, you can configure <local-affinity/> instead of <primary-owner-affinity/> for <infinispan-session-management> setting in the distributable-web subsystem. It will add the current local node as jvmRoute suffix of the JSESSIONID Cookie instead of adding the primary cache owner of the session.
<subsystem xmlns="urn:jboss:domain:distributable-web:2.0" default-session-management="default" default-single-sign-on-management="default">
<infinispan-session-management name="default" cache-container="web" granularity="SESSION">
<local-affinity/> <!-- replace from <primary-owner-affinity/> -->
</infinispan-session-management>
<infinispan-single-sign-on-management name="default" cache-container="web" cache="sso"/>
<infinispan-routing cache-container="web" cache="routing"/>
</subsystem>
Note that the same behavior happens with both "dist" and "repl" modes in EAP 7.x because "repl" is implemented as a special case of "dist" (i.e. where owners == cache view size). So, unlike EAP 6, changing from "distributed-cache" to "replicated-cache" can not be used as workaround.
mod_cluster
If remaining with a <primary-owner-affinity/> configuration and using mod_cluster, then you can add DeterministicFailover On to your httpd mod_cluster conf. This will have the balancer consistently failover a given session to the same balancer member. So this avoids random session spraying after suspending or disabling a cluster member with a <primary-owner-affinity/> configuration since the session should then always failover to the same new cluster member.
Root Cause
- This is the result of the design of distribution mode. When a node gets a request and does not own the session, then instead of making a network call to get the session information, it sends the request back to the originating node, thereby saving a network call.
- For EAP 6: BZ-1233400
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments