Red Hat OpenShift Service on AWS (ROSA): Use of Node Tuning Operator

Updated -

Environment

  • Red Hat OpenShift Service on AWS (ROSA)

Background

Red Hat manages all nodes in a ROSA cluster: control plane nodes, infra nodes, and worker nodes. A number of resources are managed or protected by the Service Reliability Engineering Platform team. Customers should not attempt to modify these resources because doing so can lead to cluster instability. The list of managed resources are outlined in the Managed Resources Documentation.

There are situations where a customer may need to tune the worker node to improve performance for a specific application. In these cases the Node Tuning Operator can be used to manage node-level tuning by orchestrating the TuneD daemon. Many high-performance applications require some level of kernel tuning. The Node Tuning Operator provides a unified management interface to users of node-level sysctls and more flexibility to add custom tuning specified by user needs.

The Node Tuning Operator (NTO) can be used with ROSA however a number of considerations should be noted and addressed:

  1. The NTO must only be used to tune compute/worker nodes

    • Any tuning parameters applied using the NTO may not be applied to Control Plane nodes or Infrastructure nodes. See the heading “Labeling Nodes and Using NTO Selectively” in this document for how this can be achieved.
    • Any tuning parameters must be applied to only compute/worker nodes by uniquely labeling your machine-pool(s)
    • Any Node Tuning changes that impact Infrastructure or Control-Plane nodes will result in the cluster being placed into limited-support mode.
  2. Red Hat SRE is not responsible for the tuning and performance management of nodes that are customized using the Node Tuning Operator. If changes made by the NTO cause the worker nodes to be unschedulable or unresponsive then the customer may be asked to revert the changes made using the NTO.

  3. The Node Tuning Operator is a fully supported component of OpenShift and customers can work with Red Hat support for any issues configuring the NTO on the ROSA platform.

References

Appendix

Labeling Nodes and Using NTO Selectively

NTO must take effect selectively on your compute/worker machine-pools, the following is an example of how to achieve this.
The sequence of events for implementing this are as follows:


  1. Create a non-Default machine-pool using ROSA CLI or OpenShift Cluster Manager (OCM) with a label of your choice
  2. Prepare the Tuned CR with profile. Set NTO to take effect only on the specific compute node machine-pool label
  3. Validate the NTO Profile is applied

  1. This example prepares a machinepool called ‘mp-1’ with the label ‘nto=enabled’

    $ rosa create machinepool -c mycluster --name=mp-1 --replicas=2 --instance-type=m5.xlarge --labels nto=enabled
    I: Machine pool 'mp-1' created successfully on cluster 'mycluster'
    I: To view all machine pools, run 'rosa list machinepools -c mycluster'
    
  2. This example shows an NTO "example" profile being applied to the ‘nto=enabled’ label (marked as a node type label)

    apiVersion: tuned.openshift.io/v1
    kind: Tuned
    metadata:
      name: openshift-example
      namespace: openshift-cluster-node-tuning-operator
    spec:
      profile:
      - data: |
          [main]
          summary=Custom OpenShift profile
          include=openshift-node
          # [sysctl]   # Section with minimum changes
          # fs.inotify.max_user_instances=16384
          name: openshift-example
      recommend:
      - match:
        - label: nto
          value: enabled
        type: node
        priority: 21
        profile: openshift-dummy
    
  3. Validate the NTO profile is applied

    $ oc -n openshift-cluster-node-tuning-operator get profile
    NAME                           TUNED                     APPLIED   DEGRADED   AGE
    ip-10-0-137-19.ec2.internal    openshift-node            True      False      151m
    ip-10-0-142-21.ec2.internal    openshift-control-plane   True      False      157m
    ip-10-0-161-117.ec2.internal   openshift-node            True      False      151m
    ip-10-0-166-196.ec2.internal   openshift-control-plane   True      False      140m
    ip-10-0-177-51.ec2.internal    openshift-node            True      False      150m
    ip-10-0-190-249.ec2.internal   openshift-node            True      False      151m
    ip-10-0-204-107.ec2.internal   openshift-control-plane   True      False      157m
    ip-10-0-204-213.ec2.internal   openshift-control-plane   True      False      140m
    ip-10-0-217-121.ec2.internal   openshift-example         True      False      10m
    ip-10-0-230-61.ec2.internal    openshift-control-plane   True      False      157m
    

Comments