Why do virtual machines try to boot from unselected devices after a network boot failure?

Updated -

The seabios BIOS code, used by QEMU as the initial boot code for every virtual machine, can be configured to prioritize the order in which various boot targets (hard disk 1, CD, PXE, etc) are checked for valid bootstrap code to load an operating system; libvirt supports setting these priorities.

The seabios boot selector works as follows: it starts with the highest priority boot target, looking for valid bootstrap code. If seabios is unable to get valid bootstrap code from this target, it moves on to the next target in the list (for example, if booting from CD fails, an attempt will be made to boot from hard disk if that is next in the list), and so on all the way down the list. If all targets fail, BIOS will pause for 60 seconds, then restart the virtual machine, causing a retry of this process, again starting from the highest priority target.

However, if any target does have "valid" bootstrap code (determined by computing a checksum of the returned data block, and meaning that it contains bootstrap code which will attempt to load a full operating system), that code will be executed and the BIOS will consider its job done (and so no restart or retry will be attempted), even if that valid boot sector is actually unable to boot an operating system. For example, most hard disks that have a partition table but do not yet have an operating system installed will still have a valid boot sector, and when that boot sector is executed, it will display a message similar to the following:

BOOT DISK FAILURE, PRESS ANY KEY

The system waits indefinitely on this message until a key is pressed. Then, the bootstrap code that was loaded from the hard disk will again attempt to boot from the same hard disk ad infinitum until the guest is manually restarted.

Unfortunately, seabios will attempt to retrieve a boot sector from every potential boot target, not just those that have been assigned a priority, so there is no way to completely disable any specific boot target (that is remove it from the list); it can only be moved lower in the priority list.

This can be problematic if, for example, PXE (network) boot is set as the highest priority on a system that also contains a non-bootable hard disk; if the PXE boot code is unable to contact the PXE boot server before it times out (the timeout is quite short, and unfortunately not configurable), the seabios boot selector will move on to the hard disk (even if it hasn't been given any boot priority by libvirt at virtual machine startup time), and execute its boot sector, which results in the virtual machine being stuck at a press any key input prompt, thus requiring direct user intervention to boot the virtual machine.

Workaround

Currently, the only known way to work around this problem is to create a small disk image with a valid boot sector that simply reboots the virtual machine, and mark that disk with a boot priority higher than the priority of the existing hard disk (but lower than the priority of the desired boot target). The following shell command will create the appropriate disk image:

   echo -e "00: b0 fe e6 64\n1fc: 00 00 55 aa\n" \
           "7fffc: 00 00 00 00" \
       | xxd -r -g 1 -c 4 \
       >/var/lib/libvirt/images/reboot.img

Note that the xxd command is part of the vim-common package. The following is the XML required to add that device to a guest:

    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/reboot.img'/>
      <target dev='hdb' bus='ide'/>
      <address type='drive' controller='0' bus='0' 
               target='0' unit='1'/>
      <boot order='2'/>
    </disk>

After you have saved the above XML to the reboot.xml file, you can add this disk to domain "X" with the following command:

virsh attach-device X reboot.xml --config

You can change the controller, bus, target, unit and target device to any available values , as long as <boot order='x'/> places this device after PXE boot, and prior to any unwanted device in the priority list.

Note the <boot order='2'/> - this assumes that you will add <boot order='1'/> to the guest's <interface> definition to ensure that PXE boot is attempted before booting from this disk. Also note that specifying <boot order='x'/> for individual devices is incompatible with the older method of specifying multiple <boot dev='hd|net|etc'/> elements inside the <os> element of the guest configuration -- you will need to remove any such lines from your guest's configuration).

Comments