haldaemon fails to start on system with a large number of disks in RHEL 5 and RHEL 6

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6

Issue

  • On server boot or when running haldaemon via initscript - hald fails to start:

    # /etc/init.d/haldaemon start
    Starting HAL daemon:               FAILED
    
  • When running in the foreground, starting hald is successful:

    # hald --use-syslog --verbose=yes --daemon=no
    
  • The haldaemon service takes a long time at startup and eventually fails to start, but running hald --daemon=no manually works.

Resolution

  1. Upgrade to hal-0.5.8.1-62.el5 or later.
  2. Then create the file /etc/sysconfig/haldaemon and edit it by adding the following command line argument for hald:

    --child-timeout=600
    
  3. Please tweak the timeout value in accordance with the maximum time the child process takes to probe all the LUNs existing on you system.

Root Cause

  • The hald daemon is timing out waiting for the child process to probe all the devices. By default, hald waits for 250 seconds (4 minutes, 10 seconds) for its child process to complete device probing.
  • The issue seems to occur most frequently on systems with a large number of disks.

Diagnostic Steps

  • Determine how long it takes for hald to fail to start. You can do this by
    • Running service haldaemon restart and then timing how long hald runs before failure, or
    • Running
hald --use-syslog --verbose=yes
  • and then examining the time stamps in the system log to determine when hald started and when it emitted its last message before exiting.
  • Component
  • hal

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments