System falls in to emergency mode due initrd-switch-root.service entered failed state.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 7.8 (after upgrade from a previous version)
  • systemd

Issue

  • The below service fails at boot time causing the system to fall in to emergency mode.

    # systemctl status initrd-switch-root.service
    ● initrd-switch-root.service - Switch Root
       Loaded: loaded (/usr/lib/systemd/system/initrd-switch-root.service; static; vendor preset: disabled)
       Active: failed (Result: signal) since Fri 2020-04-17 14:36:17 CEST; 5min ago
      Process: 502 ExecStart=/usr/bin/systemctl --no-block --force switch-root /sysroot (code=killed, signal=TERM)
     Main PID: 502 (code=killed, signal=TERM)
    
  • After upgrade from 7.x to 7.8, machine halts at emergency prompt.

Resolution

Update to systemd-219-67.el7_7.9shipped with Advisory RHBA-2020:2540 or newer before update from RHEL 7.7 to RHEL 7.8 will prevent this issue.

In case you already run into the issue, please rebuild the initramfs images:

  1. Enter the emergency shell or boot up the machine (Ctrl-D)

  2. Recreate the initramfs images using the following command

    # dracut --force --regenerate-all
    

    Note: it is recommended to rebuild all initramfs images otherwise you may hit the issue while booting an older kernel.

Root Cause

The issue is due to a timing issue upon switching root when the initramfs contains an old systemd binary:
1. The old systemd binary starts switching root by executing initrd-switch-root.service unit which internally executes systemctl program
2. The new systemd binary on the root file system sends a SIGTERM to initrd-switch-root.service unit while old systemctl program was still executing
3. Due to old systemctl program not having the fix for Private BZ 1754053 - systemd ends up in emergency mode with failed initrd-switch-root.service, emergency.target is fired.

The defect was present since a long time, but never triggered before updating to systemd-219-71.el7 or later.

Diagnostic Steps

  1. Check the size of the systemd binary installed on the system

    # ls -l /usr/lib/systemd/systemd
    -rwxr-xr-x. 1 root root 1628536 Mar 17 10:50 /usr/lib/systemd/systemd
    
  2. Check the size of the systemd binary embedded in the initramfs which enters Emergency mode

    # lsinitrd /boot/initramfs-$(uname -r).img | grep "usr/lib/systemd/systemd$"
    lrwxrwxrwx   1 root     root           23 Apr 17 14:18 init -> usr/lib/systemd/systemd
    -rwxr-xr-x   1 root     root      1620416 Apr 17 14:18 usr/lib/systemd/systemd
    

In the example above, the sizes differ, indicating that a different systemd is running in the initramfs, which may cause the issue described in this solution if systemd is older than systemd-219-70.el7.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

11 Comments

The defect was present since a long time, but never triggered before updating to systemd-219-71.el7 or later. --> The previous systemd version does not seem to have this problem. It may be introduced by the following commit ? https://github.com/systemd/systemd/commit/1f0958f640b87175cd547c1e69084cfe54a22e9d

The defect was present since a long time, but never triggered before updating to systemd-219-71.el7 or later. --> The previous systemd version does not seem to have this problem. It may be introduced by the following commit ? (https://github.com/systemd/systemd/commit/1f0958f640b87175cd547c1e69084cfe54a22e9d)

The defect was present since a long time, but never triggered before updating to systemd-219-71.el7 or later. --> The previous systemd version does not seem to have this problem. It may be introduced by the following commit ? (https://github.com/systemd/systemd/commit/1f0958f640b87175cd547c1e69084cfe54a22e9d)

The defect was present since a long time, but never triggered before updating to systemd-219-71.el7 or later.

The previous systemd version does not seem to have this problem. It may be introduced by the following commit ? (https://github.com/systemd/systemd/commit/1f0958f640b87175cd547c1e69084cfe54a22e9d)

The defect was present since a long time, but never triggered before updating to systemd-219-71.el7 or later.

The previous systemd version does not seem to have this problem. It may be introduced by the following commit ? (https://github.com/systemd/systemd/commit/1f0958f640b87175cd547c1e69084cfe54a22e9d)

The defect was present since a long time, but never triggered before updating to systemd-219-71.el7 or later.

The previous systemd version does not seem to have this problem. It may be introduced by the following commit

The previous systemd version does not seem to have this problem. It may be introduced by the following commit

The previous systemd version does not seem to have this problem. It may be introduced by the following commit

Yes, it's very likely this commit that introduced the issue. If was introduced in the following systemd releases:

  • systemd-219-69

Sorry for submitting so many duplicate comments because of a bug in the webpage.

Regenerating initramfs may introduce some unknown risks. The following patch may fix it without regenerating iniramfs: https://github.com/systemd-rhel/rhel-7/pull/117