RHEV Host cannot access one of the Storage Domains attached to the Data Center

Solution Unverified - Updated -

Issue

  • Host cannot access one of the Storage Domains attached to the Data Center.
  • 4 of 6 hosts became non-operational. The events showed:

    Host cannot access one of the Storage Domains attached to the Data Center. Setting Host state to Non-Operational.
    
  • Two VMs were paused and were then manually shutdown. The 4 hosts were then rebooted. The activation of these hosts in the RHEV-M GUI took much longer than usual and at a certain moment one of the hosts became non-operational. Eventually this host became available again and then all hosts were up and working.
  • On two of the hosts the var/log/messages file contained several instances of "hung task detection", all of them related to sanlock:

    Apr 16 16:33:09 Host-3 kernel: INFO: task sanlock:4362 blocked for more than 120 seconds.
    Apr 16 16:33:09 Host-3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Apr 16 16:33:09 Host-3 kernel: sanlock       D 000000000000000c     0  4362      1 0x00000080
    Apr 16 16:33:09 Host-3 kernel: ffff8817c1e25eb8 0000000000000086 0000000000000000 ffff8818688cf378
    Apr 16 16:33:09 Host-3 kernel: 000000000000000e ffffea00a71ca6b8 ffff8817c1e25e78 ffffffff8114ba24
    Apr 16 16:33:09 Host-3 kernel: ffff8817c1dd1098 ffff8817c1e25fd8 000000000000fb88 ffff8817c1dd1098
    Apr 16 16:33:09 Host-3 kernel: Call Trace:
    Apr 16 16:33:09 Host-3 kernel: [<ffffffff8114ba24>] ? free_pages_and_swap_cache+0xb4/0xe0
    Apr 16 16:33:09 Host-3 kernel: [<ffffffff814ea3e3>] io_schedule+0x73/0xc0
    Apr 16 16:33:09 Host-3 kernel: [<ffffffff811be032>] wait_for_all_aios+0xd2/0x110
    Apr 16 16:33:09 Host-3 kernel: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
    Apr 16 16:33:09 Host-3 kernel: [<ffffffff811befa7>] io_destroy+0x87/0xe0
    Apr 16 16:33:09 Host-3 kernel: [<ffffffff811bf01b>] sys_io_destroy+0x1b/0x60
    Apr 16 16:33:09 Host-3 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
    
  • These were preceded by NFS timeouts such as:

    Apr 16 16:30:25 Host-3 kernel: nfs: server Host-5.x.x.x not responding, timed out
    

Environment

  • Red Hat Enterprise Virtualization (RHEV) 3.1, 3.2
  • Red Hat Enterprise Linux (RHEL) 6.3 hosts, with:

    • 2.6.32-279.19.1 kernel
    • vdsm-4.9.6-44.3, vdsm-4.9.6-45.2
    • libvirt-0.9.10-21.el6_3.7, libvirt-0.9.10-21.el6_3.8

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.