mdsd pods throwing error could not write forward header
Environment
- Azure RedHat OpenShift [ARO]
- 4.x
Issue
- The
mdsdpods in the namespaceopenshift-azure-logginghas multiple restarts and throwing error messages.
$ oc get pods -n openshift-azure-logging
NAME READY STATUS RESTARTS AGE
mdsd-xxxx 2/2 Running 3 298d
mdsd-xxxx 2/2 Running 4 272d
mdsd-xxxx 1/2 Running 19762 298d
- Error message:
[2022/08/31 12:23:29] [error] [output:forward:forward.0] could not write forward header
[2022/08/31 12:23:29] [error] [output:forward:forward.0] could not write forward header
[2022/08/31 12:23:29] [error] [output:forward:forward.0] could not write forward header
[2022/08/31 12:23:29] [error] [output:forward:forward.0] could not write forward header
[2022/08/31 12:23:29] [ warn] [engine] chunk '1-xxxxx.xxxxx.flb' cannot be retried: task_id=6, input=tail.1 > output=forward.0
[2022/08/31 12:23:29] [ warn] [engine] failed to flush chunk '1-xxxxx.xxxxx.flb', retry in 9 seconds: task_id=1, input=systemd.0 > output=forward.0 (out_id=0)
[2022/08/31 12:23:29] [ warn] [engine] failed to flush chunk '1-xxxxx.xxxxx.flb', retry in 9 seconds: task_id=4, input=tail.1 > output=forward.0 (out_id=0)
[2022/08/31 12:23:29] [ warn] [engine] failed to flush chunk '1-xxxxx.xxxxx.flb', retry in 9 seconds: task_id=5, input=tail.2 > output=forward.0 (out_id=0)
- The
MDSDpod logs also indicates issues most likely due to old certificates/authorization issues betweenMDSDandGeneva.
{"Message":"Unauthorized","Code":"Forbidden","StackTrace":"","Details":null}
2022-08-29T02:49:18.7588860Z: MdsRestInterface::QueryGcsAccountInfo() failed
2022-08-29T02:49:18.7589320Z: LoadGcsKey() returned false; next reload(minutes): 1
2022-08-29T02:50:10.2042960Z: Blob write failed due to storage exception: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.. Http status code: 403. Extended info: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
PUCMupdates on the cluster were failing for the cluster because of anNSGorRPpermissions modification:
network.SubnetsClient#CreateOrUpdate: Failure sending request: StatusCode=403 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client 'xxxxxx-xxxx-xxxx-xxxx-xxxxxx' with object id 'xxxxxx-xxxx-xxxx-xxxx-xxxxxx' has permission to perform action 'Microsoft.Network/virtualNetworks/subnets/write' on scope '/subscriptions/xxxxxx-xxxx-xxxx-xxxx-xxxxxx/resourceGroups/xxxxxxxx/providers/Microsoft.Network/virtualNetworks/xxxx/subnets/xxxx'; however, it does not have permission to perform action 'Microsoft.Network/networkSecurityGroups/join/action' on the linked scope(s) '/subscriptionsxxxxxx-xxxx-xxxx-xxxx-xxxxxx/resourceGroups/xxxxxxx/providers/Microsoft.Network/networkSecurityGroups/xxxxxxxxx' or the linked scope(s) are invalid."
Resolution
- Give the respective client join permissions on the required
Network Security Group(NSG). - Since end users retain full administration rights over cluster resources and groups, it is impossible to anticipate all possible configurations that could be applied, which may prevent normal cluster maintenance tasks.
- The support agreement states that end users should avoid placing policies within their subscription or management group that hinder
SREs from performing regular maintenance on theAzure Red Hat OpenShift` cluster. - In situations like these, it is recommended that there is collaboration between all parties involved to address any misconfigurations that hinder maintenance tasks, ensuring uninterrupted normal cluster operations.
Root Cause
- The
NSGcreated by the end user, and theSPassociated with the cluster, did not possess the necessaryNetwork Contributorpermissions over theNSG. - During the process of
PUCM, theSREensures that service endpoints are enabled for storage account access and enables them if necessary. - This action implicitly triggers the
Subnet :CreateOrUpdateoperation, which in turn invokes theMicrosoft.Network/networkSecurityGroups/join/actionoperation. - It is important to note that even though this action is idempotent when the
NSGis already attached to the subnet, theNetwork Contributorpermissions are still required to execute these actions.
Diagnostic Steps
- Check the pods in the
openshift-azure-loggingnamespace.
$ oc get pods -n openshift-azure-logging
- Check the events and pod logs of the
mdsdpods.
$ oc get events -n openshift-azure-logging
$ oc logs mdsd-xxxxx -n openshift-azure-logging
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments