Satellite master and capsule communication

Latest response

Hello,

Does anyone have any detailed technical information on how satellite master and capsule communicates ? how many capsules one master can handle by putting all the capsules in different geographic locations and how the content sync will work to capsule, when i promote content view, how much latency or timeout will happen to sync contents to capsules.

Thanks,
DJ

Responses

Hello,

The Architecture Guide has a section Capsule Server Overview which includes network diagrams.

The Installtion Guide has an appendix Capsule Server Scalability Considerations.

Thanks, But those are very limited information, basically I want to find out how the goferd on capsule and qdrouterd on master communicates, is there any timeout value or heartbeat the continually poll for connections .

apart from that, setting up download policy for repositories to ondemand does it have any affect on downloading contents to capsules ?

I do find capsule scalability document useful. thanks for that.

Hello

Re. "timeout value or heartbeat" I can try to find out, but it would help if you explained why you want to know. I think you should also raise a support case if there is a specific issue you need to resolve. You might find this Kbase solution interesting goferd stops connecting to Satellite/Capsule after few hours.

Re. "have any affect on downloading contents to capsules"

If a repository is set to "on demand" it should only download an RPM when required, regardless of the repository location. The repositories in Satellite Server are handled by the integrated Capsule. The download policies are described here Using Download Policies, but it does not make these points clear. I will raise a docs bug and get one of the developers to confirm what I have said is correct.

"but it would help if you explained why you want to know"

We are just finalizing our architecture and our capsule will be behind firewall, so want to understand communication mechanism between capsule and master ( I have checked that archi diagram and found those ports ), but it does not have any timeout settings and all also any latency information to consider.

regarding "repositories" I was talking about external capsules, so if I set download policy to "on demand" when my content gets synced to external capsules does it have any impact on those policy, from document it looks to me, it just initial setup when we are doing it first time it will help.

Speaking purely about communication related to katello-agent / goferd here. Recall the setup is: goferd on any client needs to be connected to qpidd on Satellite via a network of qdrouterd routers:

  • goferd connects to Satellite/Capsule (wherever the client is registered to) to qdrouterds port 5647 - this connection has heartbeat 10s.
  • qdrouterd on Capsule connects to qdrouterd on Satellite to port 5646 - this connection has heartbeat 1 or 2 seconds I think (can check if interested). Note that if this connection gets disrupted, it is tried to re-connected frequently - but even when the reconnect is successful, it can take up to approx. 20 seconds to "heal" from the disruption and to route links from goferd clients again.
  • qdrouterd on Satellite is connected to qpidd via loopback to Satellite. I think no heartbeat is there, but there is no need of it (due to lo interface used).

Thanks, that helps a lot.

"this connection has heartbeat 1 or 2 seconds I think (can check if interested)."
can you help to check above, is that adjustable ? does it have any impact if it's behind firewall or something ?

if i check /etc/qpid-dispatch/qdrouterd.conf

connector { name: broker addr: anglxd00001.nomura.com port: 5671 sasl-mechanisms: ANONYMOUS role: on-demand ssl-profile: client idle-timeout-seconds: 0 }

if you see above it has 'idle-timeout-seconds' , that means ?

The inter-router heartbeats have default interval 1 second. ROUTER_HELLO AMQP frame is sent every second and if no response is received within 3 seconds, connection/session is assumed as dead.

Both is configurable via qdrouterd.conf:

router {
    mode: interior
    router-id: here.should.be.fqdn.of.the.host
    hello-interval: 1
    hello-max-age: 3
}

You need to restart qdrouterd service to apply the change. Note that a Satellite/Capsule upgrade (in fact any execution of satellite-installer will purge out this config, due to problems like https://bugzilla.redhat.com/show_bug.cgi?id=1305782.

Re connector to broker: that is the qdrouterd->qpidd connection / connector. Since the version of qpidd that Satellite uses does not implement this type of heatrbeats, they have to be disabled from qdrouterd side - that is achieved via the idle-timeout-seconds option. Anyway since this connection goes via loopback, heartbeats are not such important.

Thanks for the information, only 2 more please

by "inter-router " you mean qdrouterd from capsule to qdrouterd master right?

1) I Assume the "router" configuration you mention I can increase timeout If I want to ? by adjusting hello-interval .. and that is "capsule <-> master" communication ?

2) if my Capsule is behind firewall let say in Japan and my Satellite master is in US, will there by any WAN RTT impact , or any specific latency requirements ?

Yes, inter-router I mean qdrouterd(Satellite) <-> qdrouterd(Capsule).

1) It is safe to increase the values. With the obvious impact of detecting lost connection later (but preventing false alarms and redundant connection drops in case peer replies a bit later). Note that you should set the values uniformly accross whole qdrouterd network (i.e. to the Satellite and all Capsules), to prevent misconfigs like "my hello-interval is longer than peer's hello-max-age".

2) No specific requirements. You just might need to tune up the hello interval / max age, in case RTT can reach the limit under normal / heavy load. You might be interested to test and see the latency, by adding to the config:

log {
    module: ROUTER_HELLO
    enable: trace+
    timestamp: true
    output: /path/to/qdrouterd.log
}

and restrating qdrouterd (ensure qdrouterd user can write to the file, and be aware a router start purges old file content, instead of appending to its end). Then you will see when each peer sent or received the HELLO keepalive.

I raised this bug: Bug 1446098 - The "Using Download Policies" section lacks detail. Feel free to add yourself to the c.c. list if you want to follow the progress.

Thank you.

Thanks, for raising bug.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.