Processes requiring Real-Time Scheduling fail with "sched_setscheduler: Operation not permitted" error or similar
Environment
- Red Hat Enterprise Linux (RHEL) 7
- Red Hat Enterprise Linux 8
- systemd
Issue
-
Services trying to acquire real-time scheduling fail to start,
strace
on the service executable shows a EPERM (Operation not permitted) error when calling thesched_setscheduler
syscall withSCHED_RR
parameter:# strace <program> 2>&1 >/dev/null | grep sched_setscheduler sched_setscheduler(0, SCHED_RR, { 99 }) = -1 EPERM (Operation not permitted)
-
Services acquiring real-time scheduling start normally at boot, but fail to be restarted, and usually show the error message shown above
- Oracle RAC or other applications which use Real Time process scheduling fails but runs without issue after running
cgclear
Resolution
-
Follow the instructions in the Diagnostic Steps section and proceed further if this is a match
-
Read the dedicated article How to configure a RHEL 7 system to be able to run programs requiring Real-Time Scheduling and choose one of the proposed solutions, depending on your requirements
-
If
insights-client
is actively running on the system, refer to the following solution Insights-client may prevent process to acquire real-time scheduling
Root Cause
-
If a service acquiring real-time scheduling starts normally at boot but fails to be restarted later, then
- either CPU Accounting is partially enabled at boot (i.e. enabled in
/etc/systemd/system.conf
but not in the initramfs) - or CPU Accounting is enabled at run-time by a service starting after the service acquiring real-time scheduling
- either CPU Accounting is partially enabled at boot (i.e. enabled in
-
If a service acquiring real-time scheduling doesn't start anymore at boot after a reboot of the system, then
- either CPU Accounting was enabled at boot explicitly prior to rebooting
- or a service which was not started prior to rebooting is now enabling CPU Accounting implicitly or explicitly
-
For
insights-client
, the unit file for the application includedCPUQuota=30%
which triggered the behavior described in the above article. This has since been changed.
Diagnostic Steps
-
Verify that systemd makes use of the
cpu
andcpuacct
CGroup controllers# ls -d /sys/fs/cgroup/{cpu,cpuacct}/*.slice ls: cannot access /sys/fs/cgroup/cpu/*.slice: No such file or directory ls: cannot access /sys/fs/cgroup/cpuacct/*.slice: No such file or directory
In the example above, systemd is not currently using the
cpu
andcpuacct
CGroup controllers.
In such case, the issue being hit is not being handled by this solution.# ls -d /sys/fs/cgroup/{cpu,cpuacct}/*.slice /sys/fs/cgroup/cpuacct/system.slice /sys/fs/cgroup/cpu/system.slice /sys/fs/cgroup/cpuacct/user.slice /sys/fs/cgroup/cpu/user.slice
In the example above, systemd is currently using the
cpu
andcpuacct
CGroup controllers.
This happens when CPU Accounting has been enabled at boot or at run-time.
Please proceed further. -
Verify if systemd is configured to enable CPU Accounting at boot
# grep ^DefaultCPUAccounting= /etc/systemd/system.conf DefaultCPUAccounting=yes
In the example above, CPU Accounting is enabled at boot. If this is not the case, proceed to next diagnostic step directly.
As sanity check, verify that the booted initramfs is also configuring CPU Accounting at boot, otherwise this may lead to unexpected behaviour:
# lsinitrd /boot/initramfs-$(uname -r).img /etc/systemd/system.conf | grep ^DefaultCPUAccounting=
In the example above, the
/etc/systemd/system.conf
file on the system and in the initramfs are not synchronized, which may lead to unexpected behaviour.
In such case, please rebuild the initramfs:# dracut -f /boot/initramfs-$(uname -r).img $(uname -r)
Since CPU Accounting is enabled at boot, proceed directly to Resolution section.
-
Find out why systemd enabled CPU Accounting at run-time
Even though CPU Accounting is disabled at boot, systemd may enable it automatically when a unit being started makes use of a
CPU*
property, such asCPUAccounting=yes
orCPUQuota=<value>
(refer tosystemd.resource-control
manpage for a full list).
Note thatDelegate=yes
also enables CPU Accounting.To find out which unit made systemd enable CPU Accounting, the following command can be executed:
# egrep -ri "^(Startup)?CPU.*=(.*%|1|yes|true|on)" /usr/lib/systemd/system /etc/systemd/system /etc/systemd/system/mariadb.service.d/quota.conf CPUQuota=20%
In the example above, a custom drop-in to
mariadb.service
unit is implicitly enabling CPU Accounting because it makes use ofCPUQuota
.Since CPU Accounting is enabled at run-time due to starting a service which implicitly or explicitly enables CPU Accounting, proceed to Resolution section.
-
If you reach this step, then CPU Accounting is enabled due to some unknown reason not handled by this solution. Please contact your Red Hat Support representative.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
2 Comments
The solution article is somewhat misleading and is missing a significant amount of exposition as the issue is not as clear cut as indicated. There is a longer and better explaination of the issue here:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/pdf/migration_planning_guide/Red_Hat_Enterprise_Linux-7-Migration_Planning_Guide-en-US.pdf
It says:
"SysV services, even those with root privileges, cannot acquire real-time scheduling when the CPUAccounting option is enabled. With CPUAccounting enabled for any service, systemd makes use of the CGroup CPU bandwidth controller globally, and subsequent sched_setscheduler() system calls terminate unexpectedly due to real-time scheduling priority. To avoid this error to recur, the CGroup cpu.rt_runtime_us option can be set for the real-time using service."
That means that if CPUAccounting is enabled this can happen but if it is not then it will not happen. CPUAccounting can be enabled explicitly (by default it is not) or implicitly by using one of the options here that implies "CPUAccounting=true":
https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
So if you were to define a service that had CPUQuota=30% in its definition that would imply that CPUAccounting was true and cause the issue to be seen. If CPUAccounting not true anywhere (and default off) and nothing used any configuration that implied it as true then the issue wouldn't be seen (from what I can work out).
I've seen a RHEL 7.5 system with a service that definied CPUQuota=30% on a two node cluster. On one node with that service disabled a process could get real time and on the other system where the service was running it could not.
If, for example, that has changed at 7.6 (CPUAccounting is now always true) you need to be more explicit about when the behaviour has changed why it changed and not say that it applies to all of RHEL 7.
I can see that it's not 100% accurate at least up to 7.5 but the article presents it as just the way that all of RHEL 7 works. Accurate explainations of the cause of issues helps with understanding (in my case disabling the service that defined CPUQuota=30% shoud allow the real time process to start which allows the system to function as expected while a fix is developed for the real time process). In the longer term that real time process on the system I was looking does need to change to use a real time slice but saying that it simply does not work on RHEL 7 would appear to be simply untrue without any exposition on the underlying reasons for how it can happen.
Please improve the quality of the solution article.
Thanks for pointing that, I'll update the document.