AlertmanagerConfig with missing options causes Alertmanager to crash

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.10+

Issue

  • AlertmanagerConfig object created in user defined projects can cause Alertmanager to crash when restarted.
  • An incomplete configuration of AlertmanagerConfig is allowed to be created without the validation.
  • Alertmanager pods will be in CrashLoopBackOff state:

    $ oc get pods -n openshift-user-workload-monitoring | grep alert
    NAME                                   READY   STATUS             RESTARTS        AGE
    alertmanager-user-workload-0           5/6     CrashLoopBackOff   1 (3s ago)      23s
    
  • Alertmanager pods show error like below:

    ts=2023-09-05T12:07:33.449Z caller=coordinator.go:118 level=error component=configuration msg="Loading 
    configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="no global SMTP 
    smarthost set"
    

Resolution

This issue has been reported to Red Hat engineering. It is being tracked in OCPBUGS-18656.
For more information, please open a new support case with Red Hat Support.

Workaround[1]:Add the smtp_from and smtp_smarthostto the global section of the Alertmanager like below:

  • Print the currently active Alertmanager configuration into file alertmanager.yaml:

    $ oc -n openshift-user-worklod-monitoring get secret alertmanager-user-workload --template='{{ index .data 
    "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml
    
  • Edit the configuration in alertmanager.yaml:

    "global":
       smtp_from: noreply_uwm@example.com
       smtp_smarthost: smtp.example.com:25
    
  • Apply the new configuration in the file:

    $ oc -n openshift-user-workload-monitoring create secret generic alertmanager-user-workload --from- 
    file=alertmanager.yaml --dry-run=client -o=yaml |  oc -n openshift-user-workload-monitoring replace secret -- 
    filename=-
    

Workaround[2]:Add the from and smarthost in the AlertmanagerConfig's emailConfigs section like below:

$ oc edit AlertmanagerConfig <alertmanagerconfig-name>

spec:
  receivers:
  - name: 'email_receiver'
    emailConfigs:
    - to: 'your-email@example.com'
      from: 'alertmanager@example.com'
      smarthost: 'smtp.example.com:587'
      authUsername: 'your-email@example.com'
      authPassword:
        name: 'smtp-password-secret'
        key: 'password'

Note: smarthost and from should be defined in either global section of the Alertmanager or AlertmanagerConfig object.

Root Cause

When the AlertmanagerConfig object without options smtp_from and smtp_smarthost is created, the error appears.

Diagnostic Steps

  • The following error appears in the Alertmanager pods:

    $ oc logs alertmanager-user-workload-0 -c alertmanager -n openshift-user-workload-monitoring
    
    ts=2023-09-12T16:42:52.626Z caller=coordinator.go:118 level=error component=configuration msg="Loading 
    configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="no global SMTP 
    smarthost set"
    ts=2023-09-12T16:42:52.626Z caller=cluster.go:690 level=info component=cluster msg="gossip not settled but 
    continuing anyway" polls=0 elapsed=20.800007ms
    
  • Global section doesn't contain smtp_from and smtp_smarthost:

    $ oc -n openshift-user-worklod-monitoring get secret alertmanager-user-workload --template='{{ index .data 
    "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml
    
    global:
     resolve_timeout: 5m
     http_config:
       follow_redirects: true
     smtp_hello: localhost
     smtp_require_tls: true
    route:
     receiver: Default
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments