MetalLB controller restart might cause rolling update of service's IPs
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat MetalLB Operator
Issue
- Why service type loadbalancer IPs are changed and replaced with new addresses when MetalLB controller is restarted?
- How to workaround the issue to avoid service's ips changes?
Resolution
- This is a known bug, where MetalLB controller doesn't preserve internal state after reboot.
The bug can be followed on Jira issue OCPBUGS-16267 - A possible workaround can be to pin all services with a specific IP.
This can be done by specifying the IP via the fieldspec.loadBalancerIPor by using the custommetallb.universe.tf/loadBalancerIPsannotation. In the case of dual stack services, only the annotation can be used
Root Cause
- A service without IP that exists during MetalLB controller restart, might steal an IP that was already allocated to another service
- The previous allocation of an already allocated IP, will cause a rolling update of service's IPs until a service is allocated with IP that was not allocated to another service
- The issue is caused by a bug in the way MetalLB works:
- MetalLB keeps an internal data structure built upon the current configuration
- When the controller restarts, it handles the configuration as first step and only then starts processing the services. At this first step, the controller is not aware of which IP is already being used by a service and considers all the pools as free.
- If the controller receives a service with no allocated IP, it allocates the first IP even if it was allocated to a different service.
- When the controller process the service which was holding the IP prior to the restart, it is evaluated as already allocated to another service, and it cleans the IP and provides another IP from the pool
- Jira issue OCPBUGS-16267 is going to solve this bug by updating the MetalLB controller processing to process first the services which already have an IP assigned
Diagnostic Steps
-
Controller pod logs will show the rolling update
{"caller":"service_controller.go:60", "controller":"ServiceReconciler", "level":"info", "start reconcile":"namespace-1/service-1",ts="<timestamp>"} {"caller":"service.go:138","event": "ipAllocated","ip":["192.XXX.XXX.1"],"level":"info","message":"IP address assigned by controller", "ts": "<timestamp>"} {"caller": "main.go:98", "event":"serviceUpdated","level": "info","message": "update service object", "ts": "<timestamp>"} {"caller":"service_controller.go:103", "controller":"ServiceReconciler", "level":"info", "end reconcile":"namespace-1/service-1",ts="<timestamp>"} ... {"caller":"service.go:97", "error": "can't change sharing key for \"namespace-1/service-2\", address already in use by namespace-1/service-1", "event": "clearAssignment", "level": "info", "msg": "current IP not allowed by config, clearing", "ts": "<timestamp>"} {"caller":"service.go:138","event": "ipAllocated","ip":["192.XXX.XXX.2"],"level":"info","message":"IP address assigned by controller", "ts": "<timestamp>"} {"caller": "main.go:98", "event":"serviceUpdated","level": "info","message": "update service object", "ts": "<timestamp>"} ... {"caller":"service.go:97", "error": "can't change sharing key for \"namespace-2/service-2\", address already in use by namespace-1/service-2", "event": "clearAssignment", "level": "info", "msg": "current IP not allowed by config, clearing", "ts": "<timestamp>"} {"caller":"service.go:138","event": "ipAllocated","ip":["192.XXX.XXX.3"],"level":"info","message":"IP address assigned by controller", "ts": "<timestamp>"} {"caller": "main.go:98", "event":"serviceUpdated","level": "info","message": "update service object", "ts": "<timestamp>"} ... {"caller":"service.go:97", "error": "can't change sharing key for \"namespace-3/service-3\", address already in use by namespace-2/service-2", "event": "clearAssignment", "level": "info", "msg": "current IP not allowed by config, clearing", "ts": "<timestamp>"} {"caller":"service.go:138","event": "ipAllocated","ip":["192.XXX.XXX.4"],"level":"info","message":"IP address assigned by controller", "ts": "<timestamp>"} {"caller": "main.go:98", "event":"serviceUpdated","level": "info","message": "update service object", "ts": "<timestamp>"}
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments