Chapter 5. Using Jobs and DaemonSets

5.1. Running background tasks on nodes automatically with daemon sets

As an administrator, you can create and use daemon sets to run replicas of a pod on specific or all nodes in an OpenShift Dedicated cluster.

A daemon set ensures that all (or some) nodes run a copy of a pod. As nodes are added to the cluster, pods are added to the cluster. As nodes are removed from the cluster, those pods are removed through garbage collection. Deleting a daemon set will clean up the pods it created.

You can use daemon sets to create shared storage, run a logging pod on every node in your cluster, or deploy a monitoring agent on every node.

For security reasons, the cluster administrators and the project administrators can create daemon sets.

For more information on daemon sets, see the Kubernetes documentation.

Important

Daemon set scheduling is incompatible with project’s default node selector. If you fail to disable it, the daemon set gets restricted by merging with the default node selector. This results in frequent pod recreates on the nodes that got unselected by the merged node selector, which in turn puts unwanted load on the cluster.

5.1.1. Scheduled by default scheduler

A daemon set ensures that all eligible nodes run a copy of a pod. Normally, the node that a pod runs on is selected by the Kubernetes scheduler. However, daemon set pods are created and scheduled by the daemon set controller. That introduces the following issues:

  • Inconsistent pod behavior: Normal pods waiting to be scheduled are created and in Pending state, but daemon set pods are not created in Pending state. This is confusing to the user.
  • Pod preemption is handled by default scheduler. When preemption is enabled, the daemon set controller will make scheduling decisions without considering pod priority and preemption.

The ScheduleDaemonSetPods feature, enabled by default in OpenShift Dedicated, lets you schedule daemon sets using the default scheduler instead of the daemon set controller, by adding the NodeAffinity term to the daemon set pods, instead of the spec.nodeName term. The default scheduler is then used to bind the pod to the target host. If node affinity of the daemon set pod already exists, it is replaced. The daemon set controller only performs these operations when creating or modifying daemon set pods, and no changes are made to the spec.template of the daemon set.

kind: Pod
apiVersion: v1
metadata:
  name: hello-node-6fbccf8d9-9tmzr
#...
spec:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchFields:
        - key: metadata.name
          operator: In
          values:
          - target-host-name
#...

In addition, a node.kubernetes.io/unschedulable:NoSchedule toleration is added automatically to daemon set pods. The default scheduler ignores unschedulable Nodes when scheduling daemon set pods.

5.1.2. Creating daemonsets

When creating daemon sets, the nodeSelector field is used to indicate the nodes on which the daemon set should deploy replicas.

Prerequisites

  • Before you start using daemon sets, disable the default project-wide node selector in your namespace, by setting the namespace annotation openshift.io/node-selector to an empty string:

    $ oc patch namespace myproject -p \
        '{"metadata": {"annotations": {"openshift.io/node-selector": ""}}}'
    Tip

    You can alternatively apply the following YAML to disable the default project-wide node selector for a namespace:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: <namespace>
      annotations:
        openshift.io/node-selector: ''
    #...

Procedure

To create a daemon set:

  1. Define the daemon set yaml file:

    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: hello-daemonset
    spec:
      selector:
          matchLabels:
            name: hello-daemonset 1
      template:
        metadata:
          labels:
            name: hello-daemonset 2
        spec:
          nodeSelector: 3
            role: worker
          containers:
          - image: openshift/hello-openshift
            imagePullPolicy: Always
            name: registry
            ports:
            - containerPort: 80
              protocol: TCP
            resources: {}
            terminationMessagePath: /dev/termination-log
          serviceAccount: default
          terminationGracePeriodSeconds: 10
    #...
    1
    The label selector that determines which pods belong to the daemon set.
    2
    The pod template’s label selector. Must match the label selector above.
    3
    The node selector that determines on which nodes pod replicas should be deployed. A matching label must be present on the node.
  2. Create the daemon set object:

    $ oc create -f daemonset.yaml
  3. To verify that the pods were created, and that each node has a pod replica:

    1. Find the daemonset pods:

      $ oc get pods

      Example output

      hello-daemonset-cx6md   1/1       Running   0          2m
      hello-daemonset-e3md9   1/1       Running   0          2m

    2. View the pods to verify the pod has been placed onto the node:

      $ oc describe pod/hello-daemonset-cx6md|grep Node

      Example output

      Node:        openshift-node01.hostname.com/10.14.20.134

      $ oc describe pod/hello-daemonset-e3md9|grep Node

      Example output

      Node:        openshift-node02.hostname.com/10.14.20.137

Important
  • If you update a daemon set pod template, the existing pod replicas are not affected.
  • If you delete a daemon set and then create a new daemon set with a different template but the same label selector, it recognizes any existing pod replicas as having matching labels and thus does not update them or create new replicas despite a mismatch in the pod template.
  • If you change node labels, the daemon set adds pods to nodes that match the new labels and deletes pods from nodes that do not match the new labels.

To update a daemon set, force new pod replicas to be created by deleting the old replicas or nodes.

5.2. Running tasks in pods using jobs

A job executes a task in your OpenShift Dedicated cluster.

A job tracks the overall progress of a task and updates its status with information about active, succeeded, and failed pods. Deleting a job will clean up any pod replicas it created. Jobs are part of the Kubernetes API, which can be managed with oc commands like other object types.

Sample Job specification

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  parallelism: 1    1
  completions: 1    2
  activeDeadlineSeconds: 1800 3
  backoffLimit: 6   4
  template:         5
    metadata:
      name: pi
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: OnFailure    6
#...

1
The pod replicas a job should run in parallel.
2
Successful pod completions are needed to mark a job completed.
3
The maximum duration the job can run.
4
The number of retries for a job.
5
The template for the pod the controller creates.
6
The restart policy of the pod.

Additional resources

  • Jobs in the Kubernetes documentation

5.2.1. Understanding jobs and cron jobs

A job tracks the overall progress of a task and updates its status with information about active, succeeded, and failed pods. Deleting a job cleans up any pods it created. Jobs are part of the Kubernetes API, which can be managed with oc commands like other object types.

There are two possible resource types that allow creating run-once objects in OpenShift Dedicated:

Job

A regular job is a run-once object that creates a task and ensures the job finishes.

There are three main types of task suitable to run as a job:

  • Non-parallel jobs:

    • A job that starts only one pod, unless the pod fails.
    • The job is complete as soon as its pod terminates successfully.
  • Parallel jobs with a fixed completion count:

    • a job that starts multiple pods.
    • The job represents the overall task and is complete when there is one successful pod for each value in the range 1 to the completions value.
  • Parallel jobs with a work queue:

    • A job with multiple parallel worker processes in a given pod.
    • OpenShift Dedicated coordinates pods to determine what each should work on or use an external queue service.
    • Each pod is independently capable of determining whether or not all peer pods are complete and that the entire job is done.
    • When any pod from the job terminates with success, no new pods are created.
    • When at least one pod has terminated with success and all pods are terminated, the job is successfully completed.
    • When any pod has exited with success, no other pod should be doing any work for this task or writing any output. Pods should all be in the process of exiting.

      For more information about how to make use of the different types of job, see Job Patterns in the Kubernetes documentation.

Cron job

A job can be scheduled to run multiple times, using a cron job.

A cron job builds on a regular job by allowing you to specify how the job should be run. Cron jobs are part of the Kubernetes API, which can be managed with oc commands like other object types.

Cron jobs are useful for creating periodic and recurring tasks, like running backups or sending emails. Cron jobs can also schedule individual tasks for a specific time, such as if you want to schedule a job for a low activity period. A cron job creates a Job object based on the timezone configured on the control plane node that runs the cronjob controller.

Warning

A cron job creates a Job object approximately once per execution time of its schedule, but there are circumstances in which it fails to create a job or two jobs might be created. Therefore, jobs must be idempotent and you must configure history limits.

5.2.1.1. Understanding how to create jobs

Both resource types require a job configuration that consists of the following key parts:

  • A pod template, which describes the pod that OpenShift Dedicated creates.
  • The parallelism parameter, which specifies how many pods running in parallel at any point in time should execute a job.

    • For non-parallel jobs, leave unset. When unset, defaults to 1.
  • The completions parameter, specifying how many successful pod completions are needed to finish a job.

    • For non-parallel jobs, leave unset. When unset, defaults to 1.
    • For parallel jobs with a fixed completion count, specify a value.
    • For parallel jobs with a work queue, leave unset. When unset defaults to the parallelism value.

5.2.1.2. Understanding how to set a maximum duration for jobs

When defining a job, you can define its maximum duration by setting the activeDeadlineSeconds field. It is specified in seconds and is not set by default. When not set, there is no maximum duration enforced.

The maximum duration is counted from the time when a first pod gets scheduled in the system, and defines how long a job can be active. It tracks overall time of an execution. After reaching the specified timeout, the job is terminated by OpenShift Dedicated.

5.2.1.3. Understanding how to set a job back off policy for pod failure

A job can be considered failed, after a set amount of retries due to a logical error in configuration or other similar reasons. Failed pods associated with the job are recreated by the controller with an exponential back off delay (10s, 20s, 40s …) capped at six minutes. The limit is reset if no new failed pods appear between controller checks.

Use the spec.backoffLimit parameter to set the number of retries for a job.

5.2.1.4. Understanding how to configure a cron job to remove artifacts

Cron jobs can leave behind artifact resources such as jobs or pods. As a user it is important to configure history limits so that old jobs and their pods are properly cleaned. There are two fields within cron job’s spec responsible for that:

  • .spec.successfulJobsHistoryLimit. The number of successful finished jobs to retain (defaults to 3).
  • .spec.failedJobsHistoryLimit. The number of failed finished jobs to retain (defaults to 1).

5.2.1.5. Known limitations

The job specification restart policy only applies to the pods, and not the job controller. However, the job controller is hard-coded to keep retrying jobs to completion.

As such, restartPolicy: Never or --restart=Never results in the same behavior as restartPolicy: OnFailure or --restart=OnFailure. That is, when a job fails it is restarted automatically until it succeeds (or is manually discarded). The policy only sets which subsystem performs the restart.

With the Never policy, the job controller performs the restart. With each attempt, the job controller increments the number of failures in the job status and create new pods. This means that with each failed attempt, the number of pods increases.

With the OnFailure policy, kubelet performs the restart. Each attempt does not increment the number of failures in the job status. In addition, kubelet will retry failed jobs starting pods on the same nodes.

5.2.2. Creating jobs

You create a job in OpenShift Dedicated by creating a job object.

Procedure

To create a job:

  1. Create a YAML file similar to the following:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: pi
    spec:
      parallelism: 1    1
      completions: 1    2
      activeDeadlineSeconds: 1800 3
      backoffLimit: 6   4
      template:         5
        metadata:
          name: pi
        spec:
          containers:
          - name: pi
            image: perl
            command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
          restartPolicy: OnFailure    6
    #...
    1
    Optional: Specify how many pod replicas a job should run in parallel; defaults to 1.
    • For non-parallel jobs, leave unset. When unset, defaults to 1.
    2
    Optional: Specify how many successful pod completions are needed to mark a job completed.
    • For non-parallel jobs, leave unset. When unset, defaults to 1.
    • For parallel jobs with a fixed completion count, specify the number of completions.
    • For parallel jobs with a work queue, leave unset. When unset defaults to the parallelism value.
    3
    Optional: Specify the maximum duration the job can run.
    4
    Optional: Specify the number of retries for a job. This field defaults to six.
    5
    Specify the template for the pod the controller creates.
    6
    Specify the restart policy of the pod:
    • Never. Do not restart the job.
    • OnFailure. Restart the job only if it fails.
    • Always. Always restart the job.

      For details on how OpenShift Dedicated uses restart policy with failed containers, see the Example States in the Kubernetes documentation.

  2. Create the job:

    $ oc create -f <file-name>.yaml
Note

You can also create and launch a job from a single command using oc create job. The following command creates and launches a job similar to the one specified in the previous example:

$ oc create job pi --image=perl -- perl -Mbignum=bpi -wle 'print bpi(2000)'

5.2.3. Creating cron jobs

You create a cron job in OpenShift Dedicated by creating a job object.

Procedure

To create a cron job:

  1. Create a YAML file similar to the following:

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: pi
    spec:
      schedule: "*/1 * * * *"          1
      concurrencyPolicy: "Replace"     2
      startingDeadlineSeconds: 200     3
      suspend: true                    4
      successfulJobsHistoryLimit: 3    5
      failedJobsHistoryLimit: 1        6
      jobTemplate:                     7
        spec:
          template:
            metadata:
              labels:                  8
                parent: "cronjobpi"
            spec:
              containers:
              - name: pi
                image: perl
                command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
              restartPolicy: OnFailure 9
    1
    Schedule for the job specified in cron format. In this example, the job will run every minute.
    2
    An optional concurrency policy, specifying how to treat concurrent jobs within a cron job. Only one of the following concurrent policies may be specified. If not specified, this defaults to allowing concurrent executions.
    • Allow allows cron jobs to run concurrently.
    • Forbid forbids concurrent runs, skipping the next run if the previous has not finished yet.
    • Replace cancels the currently running job and replaces it with a new one.
    3
    An optional deadline (in seconds) for starting the job if it misses its scheduled time for any reason. Missed jobs executions will be counted as failed ones. If not specified, there is no deadline.
    4
    An optional flag allowing the suspension of a cron job. If set to true, all subsequent executions will be suspended.
    5
    The number of successful finished jobs to retain (defaults to 3).
    6
    The number of failed finished jobs to retain (defaults to 1).
    7
    Job template. This is similar to the job example.
    8
    Sets a label for jobs spawned by this cron job.
    9
    The restart policy of the pod. This does not apply to the job controller.
    Note

    The .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit fields are optional. These fields specify how many completed and failed jobs should be kept. By default, they are set to 3 and 1 respectively. Setting a limit to 0 corresponds to keeping none of the corresponding kind of jobs after they finish.

  2. Create the cron job:

    $ oc create -f <file-name>.yaml
Note

You can also create and launch a cron job from a single command using oc create cronjob. The following command creates and launches a cron job similar to the one specified in the previous example:

$ oc create cronjob pi --image=perl --schedule='*/1 * * * *' -- perl -Mbignum=bpi -wle 'print bpi(2000)'

With oc create cronjob, the --schedule option accepts schedules in cron format.