3.2. cpu

The cpu subsystem schedules CPU access to cgroups. Access to CPU resources can be scheduled using two schedulers:
  • Completely Fair Scheduler (CFS) — a proportional share scheduler which divides the CPU time (CPU bandwidth) proportionately between groups of tasks (cgroups) depending on the priority/weight of the task or shares assigned to cgroups. For more information about resource limiting using CFS, refer to Section 3.2.1, “CFS Tunable Parameters”.
  • Real-Time scheduler (RT) — a task scheduler that provides a way to specify the amount of CPU time that real-time tasks can use. For more information about resource limiting of real-time tasks, refer to Section 3.2.2, “RT Tunable Parameters”.

3.2.1. CFS Tunable Parameters

In CFS, a cgroup can get more than its share of CPU if there are enough idle CPU cycles available in the system, due to the work conserving nature of the scheduler. This is usually the case for cgroups that consume CPU time based on relative shares. Ceiling enforcement can be used for cases when a hard limit on the amount of CPU that a cgroup can utilize is required (that is, tasks cannot use more than a set amount of CPU time).
The following options can be used to configure ceiling enforcement or relative sharing of CPU:

Ceiling Enforcement Tunable Parameters

cpu.cfs_period_us
specifies a period of time in microseconds (µs, represented here as "us") for how regularly a cgroup's access to CPU resources should be reallocated. If tasks in a cgroup should be able to access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 1000000. The upper limit of the cpu.cfs_quota_us parameter is 1 second and the lower limit is 1000 microseconds.
cpu.cfs_quota_us
specifies the total amount of time in microseconds (µs, represented here as "us") for which all tasks in a cgroup can run during one period (as defined by cpu.cfs_period_us). As soon as tasks in a cgroup use up all the time specified by the quota, they are throttled for the remainder of the time specified by the period and not allowed to run until the next period. If tasks in a cgroup should be able to access a single CPU for 0.2 seconds out of every 1 second, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 1000000. Note that the quota and period parameters operate on a CPU basis. To allow a process to fully utilize two CPUs, for example, set cpu.cfs_quota_us to 200000 and cpu.cfs_period_us to 100000.
Setting the value in cpu.cfs_quota_us to -1 indicates that the cgroup does not adhere to any CPU time restrictions. This is also the default value for every cgroup (except the root cgroup).
cpu.stat
reports CPU time statistics using the following values:
  • nr_periods — number of period intervals (as specified in cpu.cfs_period_us) that have elapsed.
  • nr_throttled — number of times tasks in a cgroup have been throttled (that is, not allowed to run because they have exhausted all of the available time as specified by their quota).
  • throttled_time — the total time duration (in nanoseconds) for which tasks in a cgroup have been throttled.

Relative Shares Tunable Parameters

cpu.shares
contains an integer value that specifies a relative share of CPU time available to the tasks in a cgroup. For example, tasks in two cgroups that have cpu.shares set to 100 will receive equal CPU time, but tasks in a cgroup that has cpu.shares set to 200 receive twice the CPU time of tasks in a cgroup where cpu.shares is set to 100. The value specified in the cpu.shares file must be 2 or higher.
Note that shares of CPU time are distributed per all CPU cores on multi-core systems. Even if a cgroup is limited to less than 100% of CPU on a multi-core system, it may use 100% of each individual CPU core. Consider the following example: if cgroup A is configured to use 25% and cgroup B 75% of the CPU, starting four CPU-intensive processes (one in A and three in B) on a system with four cores results in the following division of CPU shares:

Table 3.1. CPU share division

PIDcgroupCPUCPU share
100A0100% of CPU0
101B1100% of CPU1
102B2100% of CPU2
103B3100% of CPU3
Using relative shares to specify CPU access has two implications on resource management that should be considered:
  • Because the CFS does not demand equal usage of CPU, it is hard to predict how much CPU time a cgroup will be allowed to utilize. When tasks in one cgroup are idle and are not using any CPU time, the leftover time is collected in a global pool of unused CPU cycles. Other cgroups are allowed to borrow CPU cycles from this pool.
  • The actual amount of CPU time that is available to a cgroup can vary depending on the number of cgroups that exist on the system. If a cgroup has a relative share of 1000 and two other cgroups have a relative share of 500, the first cgroup receives 50% of all CPU time in cases when processes in all cgroups attempt to use 100% of the CPU. However, if another cgroup is added with a relative share of 1000, the first cgroup is only allowed 33% of the CPU (the rest of the cgroups receive 16.5%, 16.5%, and 33% of CPU).

3.2.2. RT Tunable Parameters

The RT scheduler works in a similar way to the ceiling enforcement control of the CFS (for more information, refer to Section 3.2.1, “CFS Tunable Parameters”) but limits CPU access to real-time tasks only. The amount of time for which real-time task can access the CPU is decided by allocating a run time and a period for each cgroup. All tasks in a cgroup are then allowed to access the CPU for the defined period of time for one run time (for example, tasks in a cgroup can be allowed to run 0.1 seconds in every 1 second). Please note that the following parameters take into account all available logical CPUs, therefore the access times specified with these parameters are in fact multiplied by the number of CPUs.
cpu.rt_period_us
applicable to real-time scheduling tasks only, this parameter specifies a period of time in microseconds (µs, represented here as "us") for how regularly a cgroup's access to CPU resources is reallocated.
cpu.rt_runtime_us
applicable to real-time scheduling tasks only, this parameter specifies a period of time in microseconds (µs, represented here as "us") for the longest continuous period in which the tasks in a cgroup have access to CPU resources. Establishing this limit prevents tasks in one cgroup from monopolizing CPU time. As mentioned above, the access times are multiplied by the number of logical CPUs. For example, setting cpu.rt_runtime_us to 200000 and cpu.rt_period_us to 1000000 translates to the task being able to access a single CPU for 0.4 seconds out of every 1 second on systems with two CPUs (0.2 x 2), or 0.8 seconds on systems with four CPUs (0.2 x 4).

3.2.3. Example Usage

Example 3.2. Limiting CPU access

The following examples assume you have an existing hierarchy of cgroups configured and the cpu subsystem mounted on your system:
  • To allow one cgroup to use 25% of a single CPU and a different cgroup to use 75% of that same CPU, use the following commands:
    ~]# echo 250 > /cgroup/cpu/blue/cpu.shares
    ~]# echo 750 > /cgroup/cpu/red/cpu.shares
  • To limit a cgroup to fully utilize a single CPU, use the following commands:
    ~]# echo 10000 > /cgroup/cpu/red/cpu.cfs_quota_us
    ~]# echo 10000 > /cgroup/cpu/red/cpu.cfs_period_us
  • To limit a cgroup to utilize 10% of a single CPU, use the following commands:
    ~]# echo 10000 > /cgroup/cpu/red/cpu.cfs_quota_us
    ~]# echo 100000 > /cgroup/cpu/red/cpu.cfs_period_us
  • On a multi-core system, to allow a cgroup to fully utilize two CPU cores, use the following commands:
    ~]# echo 200000 > /cgroup/cpu/red/cpu.cfs_quota_us
    ~]# echo 100000 > /cgroup/cpu/red/cpu.cfs_period_us