Processes requiring Real-Time Scheduling fail with "sched_setscheduler: Operation not permitted" error or similar

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 7
  • Red Hat Enterprise Linux 8
    • systemd

Issue

  • Services trying to acquire real-time scheduling fail to start, strace on the service executable shows a EPERM (Operation not permitted) error when calling the sched_setscheduler syscall with SCHED_RR parameter:

    # strace <program> 2>&1 >/dev/null | grep sched_setscheduler
    sched_setscheduler(0, SCHED_RR, { 99 }) = -1 EPERM (Operation not permitted)
    
  • Services acquiring real-time scheduling start normally at boot, but fail to be restarted, and usually show the error message shown above

  • Oracle RAC or other applications which use Real Time process scheduling fails but runs without issue after running cgclear

Resolution

Root Cause

  • If a service acquiring real-time scheduling starts normally at boot but fails to be restarted later, then

    • either CPU Accounting is partially enabled at boot (i.e. enabled in /etc/systemd/system.conf but not in the initramfs)
    • or CPU Accounting is enabled at run-time by a service starting after the service acquiring real-time scheduling
  • If a service acquiring real-time scheduling doesn't start anymore at boot after a reboot of the system, then

    • either CPU Accounting was enabled at boot explicitly prior to rebooting
    • or a service which was not started prior to rebooting is now enabling CPU Accounting implicitly or explicitly
  • For insights-client, the unit file for the application included CPUQuota=30% which triggered the behavior described in the above article. This has since been changed.

Diagnostic Steps

  • Verify that systemd makes use of the cpu and cpuacct CGroup controllers

    # ls -d /sys/fs/cgroup/{cpu,cpuacct}/*.slice
    ls: cannot access /sys/fs/cgroup/cpu/*.slice: No such file or directory
    ls: cannot access /sys/fs/cgroup/cpuacct/*.slice: No such file or directory
    

    In the example above, systemd is not currently using the cpu and cpuacct CGroup controllers.
    In such case, the issue being hit is not being handled by this solution.

    # ls -d /sys/fs/cgroup/{cpu,cpuacct}/*.slice
    /sys/fs/cgroup/cpuacct/system.slice    /sys/fs/cgroup/cpu/system.slice
    /sys/fs/cgroup/cpuacct/user.slice      /sys/fs/cgroup/cpu/user.slice
    

    In the example above, systemd is currently using the cpu and cpuacct CGroup controllers.
    This happens when CPU Accounting has been enabled at boot or at run-time.
    Please proceed further.

  • Verify if systemd is configured to enable CPU Accounting at boot

    # grep ^DefaultCPUAccounting= /etc/systemd/system.conf
    DefaultCPUAccounting=yes
    

    In the example above, CPU Accounting is enabled at boot. If this is not the case, proceed to next diagnostic step directly.

    As sanity check, verify that the booted initramfs is also configuring CPU Accounting at boot, otherwise this may lead to unexpected behaviour:

    # lsinitrd /boot/initramfs-$(uname -r).img /etc/systemd/system.conf | grep ^DefaultCPUAccounting=
    

    In the example above, the /etc/systemd/system.conf file on the system and in the initramfs are not synchronized, which may lead to unexpected behaviour.
    In such case, please rebuild the initramfs:

    # dracut -f /boot/initramfs-$(uname -r).img $(uname -r)
    

    Since CPU Accounting is enabled at boot, proceed directly to Resolution section.

  • Find out why systemd enabled CPU Accounting at run-time

    Even though CPU Accounting is disabled at boot, systemd may enable it automatically when a unit being started makes use of a CPU* property, such as CPUAccounting=yes or CPUQuota=<value> (refer to systemd.resource-control manpage for a full list).
    Note that Delegate=yes also enables CPU Accounting.

    To find out which unit made systemd enable CPU Accounting, the following command can be executed:

    # egrep -ri "^(Startup)?CPU.*=(.*%|1|yes|true|on)" /usr/lib/systemd/system /etc/systemd/system
    /etc/systemd/system/mariadb.service.d/quota.conf
    CPUQuota=20%
    

    In the example above, a custom drop-in to mariadb.service unit is implicitly enabling CPU Accounting because it makes use of CPUQuota.

    Since CPU Accounting is enabled at run-time due to starting a service which implicitly or explicitly enables CPU Accounting, proceed to Resolution section.

  • If you reach this step, then CPU Accounting is enabled due to some unknown reason not handled by this solution. Please contact your Red Hat Support representative.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments