Red Hat OpenShift Service on AWS (ROSA): Use of Node Tuning Operator
Table of Contents
Environment
- Red Hat OpenShift Service on AWS (ROSA)
Background
Red Hat manages all nodes in a ROSA cluster: control plane nodes, infra nodes, and worker nodes. A number of resources are managed or protected by the Service Reliability Engineering Platform team. Customers should not attempt to modify these resources because doing so can lead to cluster instability. The list of managed resources are outlined in the Managed Resources Documentation.
There are situations where a customer may need to tune the worker node to improve performance for a specific application. In these cases the Node Tuning Operator can be used to manage node-level tuning by orchestrating the TuneD daemon. Many high-performance applications require some level of kernel tuning. The Node Tuning Operator provides a unified management interface to users of node-level sysctls and more flexibility to add custom tuning specified by user needs.
The Node Tuning Operator (NTO) can be used with ROSA however a number of considerations should be noted and addressed:
-
The NTO must only be used to tune compute/worker nodes
- Any tuning parameters applied using the NTO may not be applied to Control Plane nodes or Infrastructure nodes. See the heading “Labeling Nodes and Using NTO Selectively” in this document for how this can be achieved.
- Any tuning parameters must be applied to only compute/worker nodes by uniquely labeling your machine-pool(s)
- Any Node Tuning changes that impact Infrastructure or Control-Plane nodes will result in the cluster being placed into limited-support mode.
-
Red Hat SRE is not responsible for the tuning and performance management of nodes that are customized using the Node Tuning Operator. If changes made by the NTO cause the worker nodes to be unschedulable or unresponsive then the customer may be asked to revert the changes made using the NTO.
-
The Node Tuning Operator is a fully supported component of OpenShift and customers can work with Red Hat support for any issues configuring the NTO on the ROSA platform.
References
- Service Definition for Red Hat OpenShift Service on AWS
- Shared Responsibility Matrix for Red Hat OpenShift Service on AWS
Appendix
Labeling Nodes and Using NTO Selectively
NTO must take effect selectively on your compute/worker machine-pools, the following is an example of how to achieve this.
The sequence of events for implementing this are as follows:
- Create a non-Default machine-pool using ROSA CLI or OpenShift Cluster Manager (OCM) with a label of your choice
- Prepare the Tuned CR with profile. Set NTO to take effect only on the specific compute node machine-pool label
- Validate the NTO Profile is applied
-
This example prepares a machinepool called ‘mp-1’ with the label ‘nto=enabled’
$ rosa create machinepool -c mycluster --name=mp-1 --replicas=2 --instance-type=m5.xlarge --labels nto=enabled I: Machine pool 'mp-1' created successfully on cluster 'mycluster' I: To view all machine pools, run 'rosa list machinepools -c mycluster'
-
This example shows an NTO "example" profile being applied to the ‘nto=enabled’ label (marked as a node type label)
apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-example namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Custom OpenShift profile include=openshift-node # [sysctl] # Section with minimum changes # fs.inotify.max_user_instances=16384 name: openshift-example recommend: - match: - label: nto value: enabled type: node priority: 21 profile: openshift-dummy
-
Validate the NTO profile is applied
$ oc -n openshift-cluster-node-tuning-operator get profile NAME TUNED APPLIED DEGRADED AGE ip-10-0-137-19.ec2.internal openshift-node True False 151m ip-10-0-142-21.ec2.internal openshift-control-plane True False 157m ip-10-0-161-117.ec2.internal openshift-node True False 151m ip-10-0-166-196.ec2.internal openshift-control-plane True False 140m ip-10-0-177-51.ec2.internal openshift-node True False 150m ip-10-0-190-249.ec2.internal openshift-node True False 151m ip-10-0-204-107.ec2.internal openshift-control-plane True False 157m ip-10-0-204-213.ec2.internal openshift-control-plane True False 140m ip-10-0-217-121.ec2.internal openshift-example True False 10m ip-10-0-230-61.ec2.internal openshift-control-plane True False 157m