Chapter 8. Configuring Knative Serving autoscaling
You are viewing documentation for a release of Red Hat OpenShift Serverless that is no longer supported. Red Hat OpenShift Serverless is currently supported on OpenShift Container Platform 4.3 and newer.
OpenShift Serverless provides capabilities for automatic Pod scaling, including scaling inactive Pods to zero, by enabling the Knative Serving autoscaling system in an OpenShift Container Platform cluster.
To enable autoscaling for Knative Serving, you must configure concurrency and scale bounds in the revision template.
Any limits or targets set in the revision template are measured against a single instance of your application. For example, setting the
target annotation to
50 will configure the autoscaler to scale the application so that each instance of it will handle 50 requests at a time.
8.1. Configuring concurrent requests for Knative Serving autoscaling
You can specify the number of concurrent requests that should be handled by each instance of an application (revision container) by adding the
target annotation or the
containerConcurrency field in the revision template.
Here is an example of
target being used in a revision template:
apiVersion: serving.knative.dev/v1alpha1 kind: Service metadata: name: myapp spec: template: metadata: annotations: autoscaling.knative.dev/target: 50 spec: containers: - image: myimage
Here is an example of
containerConcurrency being used in a revision template:
apiVersion: serving.knative.dev/v1alpha1 kind: Service metadata: name: myapp spec: template: metadata: annotations: spec: containerConcurrency: 100 containers: - image: myimage
Adding a value for both
containerConcurrency will target the
target number of concurrent requests, but impose a hard limit of the
containerConcurrency number of requests.
For example, if the
target value is 50 and the
containerConcurrency value is 100, the targeted number of requests will be 50, but the hard limit will be 100.
containerConcurrency value is less than the
target value, the
target value will be tuned down, since there is no need to target more requests than the number that can actually be handled.
containerConcurrency should only be used if there is a clear need to limit how many requests reach the application at a given time. Using
containerConcurrency is only advised if the application needs to have an enforced constraint of concurrency.
8.1.1. Configuring concurrent requests using the target annotation
The default target for the number of concurrent requests is
100, but you can override this value by adding or modifying the
autoscaling.knative.dev/target annotation value in the revision template.
Here is an example of how this annotation is used in the revision template to set the target to
8.1.2. Configuring concurrent requests using the containerConcurrency field
containerConcurrency sets a hard limit on the number of concurrent requests handled.
containerConcurrency: 0 | 1 | 2-N
- allows unlimited concurrent requests.
- guarantees that only one request is handled at a time by a given instance of the revision container.
- 2 or more
- will limit request concurrency to that value.
If there is no
target annotation, autoscaling is configured as if
target is equal to the value of
8.2. Configuring scale bounds Knative Serving autoscaling
maxScale annotations can be used to configure the minimum and maximum number of Pods that can serve applications. These annotations can be used to prevent cold starts or to help control computing costs.
minScaleannotation is not set, Pods will scale to zero (or to 1 if enable-scale-to-zero is false per the
maxScaleannotation is not set, there will be no upper limit for the number of Pods created.
maxScale can be configured as follows in the revision template:
spec: template: metadata: autoscaling.knative.dev/minScale: "2" autoscaling.knative.dev/maxScale: "10"
Using these annotations in the revision template will propagate this confguration to
These annotations apply for the full lifetime of a revision. Even when a revision is not referenced by any route, the minimal Pod count specified by
minScale will still be provided. Keep in mind that non-routeable revisions may be garbage collected, which enables Knative to reclaim the resources.