Why do I observe RHEV hosts becoming non-operational and data centers becoming non responsive on HP FlexFabric?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Virtualization.
  • HP Emulex FlexFabric CNAs (Converged Network Adapters) running firmware version 4.6.247.5 (HPE f/w package 2013.08.01) or earlier
  • HP Virtual Connect FlexFabric/Flex-10/10D running firmware version 4.10.

Issue

  • Hosts can not access storage domains and become Non-Operational:

    Host <host> cannot access one of the Storage Domains attached to the Data Center <dc>. Setting Host state to Non-Operational
    
  • Datacenters become non responsive due to RHEV-M failing to reach hosts over the network:

    Invalid status on Data Center <dc>. 
    Setting Data Center status to Non Responsive (On host <host>, Error: Network error during communication with the Host.)
    

Resolution


Disclaimer: The following information has been provided by Red Hat, but is outside the scope of the posted Service Level Agreements and support coverage as it involves software not provided by Red Hat.. The information is provided as-is and any configuration settings or installed applications made from the information in this article could make the Operating System unsupported by Red Hat Global Support Services. The intent of this article is to provide information to accomplish the system's needs. Use of the information in this article at your own risk.

Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.


 

  • Please contact HPE for the latest appropriate firmware package for Emulex FlexFabric (CNA) host bus adapters. For example:
    • HP Firmware Flash for Emulex Fibre Channel Host Bus and Converged Network Adapters - Linux (x86_64).
    • NOTE: The firmware package includes multiple pieces of firmware. For example, HPE's currently recommended firmware package version for Linux is 2014.04.01 (whereas HPE firmware package for VMware is 2016.08.01). The HPE 2014.04.1 firmware package includes:
      • Emulex FlexFabric/CNA firmware 4.9.416.0
      • Emulex 16 Gb HBA/Mezz universal boot 1.1.65.101
      • Emulex 8 Gb Gen8 Mezz (LPe1205A) firmware 2.02X6
      • Emulex 8 Gb HBA/Mezz firmware 2.01a12
      • Emulex 8 Gb HBA boot image 5.20x7 (2.12a14 BIOS, 4.13a8 UEFI)
      • Emulex 8 Gb Mezz boot image 6.10a15 (3.30a2 BIOS)
      • Emulex 4 Gb HBA/Mezz firmware 2.82x4
      • Emulex 4 Gb Mezz boot image 6.10a14 (3.30a1 BIOS)
      • Emulex 4 Gb HBA boot image 5.20a6 (2.12a14 BIOS)

Root Cause

The following fix in the newer firmware versions is suspected to be specifically relevant to this problem:

  • Firmware package 2014.01.01, Emulex CNA firmware 4.9.311.20
    • Enhancment added: "Added support for VC {Virtual Connect} 4.10."

Other, later, fixes may also be relevant. Again the vendor HPE is the best source of appropriate firmware versions to be running.

Diagnostic Steps

  • /var/log/messages shows sanlock failing to renew leases:

    Apr  3 11:11:48 host sanlock[11074]: 2014-04-03 11:11:48+0100 52109 [14808]: s1 delta_renew read rv -202 offset 0 /dev/0569e783-2a74-40e4-b5dd-5a4ce4ac2dda/ids
    Apr  3 11:11:48 host sanlock[11074]: 2014-04-03 11:11:48+0100 52109 [14808]: s1 renewal error -202 delta_length 10 last_success 52078
    Apr  3 11:11:59 host sanlock[11074]: 2014-04-03 11:11:59+0100 52120 [14808]: s1 delta_renew read rv -202 offset 0 /dev/0569e783-2a74-40e4-b5dd-5a4ce4ac2dda/ids
    Apr  3 11:11:59 host sanlock[11074]: 2014-04-03 11:11:59+0100 52120 [14808]: s1 renewal error -202 delta_length 11 last_success 52078
    Apr  3 11:12:10 host sanlock[11074]: 2014-04-03 11:12:10+0100 52131 [14808]: s1 delta_renew read rv -202 offset 0 /dev/0569e783-2a74-40e4-b5dd-5a4ce4ac2dda/ids
    Apr  3 11:12:10 host sanlock[11074]: 2014-04-03 11:12:10+0100 52131 [14808]: s1 renewal error -202 delta_length 10 last_success 52078
    Apr  3 11:12:17 host sanlock[11074]: 2014-04-03 11:12:17+0100 52138 [11074]: s1 check_our_lease warning 60 last_success 52078
    Apr  3 11:12:18 host sanlock[11074]: 2014-04-03 11:12:18+0100 52139 [11074]: s1 check_our_lease warning 61 last_success 52078
    Apr  3 11:12:19 host sanlock[11074]: 2014-04-03 11:12:19+0100 52140 [11074]: s1 check_our_lease warning 62 last_success 52078
    [...]
    Apr  3 11:12:31 host wdmd[11027]: test warning now 52152 ping 52142 close 27468 renewal 52078 expire 52158 client 11074 sanlock_0569e783-2a74-40e4-b5dd-5a4ce4ac2dda:8
    Apr  3 11:12:31 host wdmd[11027]: /dev/watchdog closed unclean
    Apr  3 11:12:31 host kernel: hpwdt: Unexpected close, not stopping watchdog!
    Apr  3 11:12:32 host sanlock[11074]: 2014-04-03 11:12:32+0100 52153 [14808]: s1 delta_renew read rv -2 offset 0 /dev/0569e783-2a74-40e4-b5dd-5a4ce4ac2dda/ids
    Apr  3 11:12:32 host sanlock[11074]: 2014-04-03 11:12:32+0100 52153 [14808]: s1 renewal error -2 delta_length 10 last_success 52078
    Apr  3 11:12:32 host sanlock[11074]: 2014-04-03 11:12:32+0100 52153 [11074]: s1 check_our_lease warning 75 last_success 52078
    Apr  3 11:12:32 host wdmd[11027]: test warning now 52153 ping 52142 close 52152 renewal 52078 expire 52158 client 11074 sanlock_0569e783-2a74-40e4-b5dd-5a4ce4ac2dda:8
    Apr  3 11:12:33 host sanlock[11074]: 2014-04-03 11:12:33+0100 52154 [11074]: s1 check_our_lease warning 76 last_success 52078
    Apr  3 11:12:33 host wdmd[11027]: test warning now 52154 ping 52142 close 52152 renewal 52078 expire 52158 client 11074 sanlock_0569e783-2a74-40e4-b5dd-5a4ce4ac2dda:8
    Apr  3 11:12:34 host sanlock[11074]: 2014-04-03 11:12:34+0100 52155 [11074]: s1 check_our_lease warning 77 last_success 52078
    Apr  3 11:12:34 host wdmd[11027]: test warning now 52155 ping 52142 close 52152 renewal 52078 expire 52158 client 11074 sanlock_0569e783-2a74-40e4-b5dd-5a4ce4ac2dda:8
    Apr  3 11:12:35 host sanlock[11074]: 2014-04-03 11:12:35+0100 52156 [11074]: s1 check_our_lease warning 78 last_success 52078
    Apr  3 11:12:35 host wdmd[11027]: test warning now 52156 ping 52142 close 52152 renewal 52078 expire 52158 client 11074 sanlock_0569e783-2a74-40e4-b5dd-5a4ce4ac2dda:8
    Apr  3 11:12:36 host sanlock[11074]: 2014-04-03 11:12:36+0100 52157 [11074]: s1 check_our_lease warning 79 last_success 52078
    Apr  3 11:12:36 host wdmd[11027]: test warning now 52157 ping 52142 close 52152 renewal 52078 expire 52158 client 11074 sanlock_0569e783-2a74-40e4-b5dd-5a4ce4ac2dda:8
    Apr  3 11:12:37 host sanlock[11074]: 2014-04-03 11:12:37+0100 52158 [11074]: s1 check_our_lease failed 80
    Apr  3 11:12:37 host wdmd[11027]: /dev/watchdog reopen
    Apr  3 11:12:37 host sanlock[11074]: 2014-04-03 11:12:37+0100 52158 [11074]: s1 all pids clear
    Apr  3 11:12:42 host sanlock[11074]: 2014-04-03 11:12:42+0100 52164 [14808]: s1 delta_renew read rv -2 offset 0 /dev/0569e783-2a74-40e4-b5dd-5a4ce4ac2dda/ids
    Apr  3 11:12:42 host sanlock[11074]: 2014-04-03 11:12:42+0100 52164 [14808]: s1 renewal error -2 delta_length 10 last_success 52078
    Apr  3 11:12:52 host sanlock[11074]: 2014-04-03 11:12:52+0100 52174 [14808]: 0569e783 close_task_aio 0 0x7f8eec0008c0 busy
    Apr  3 11:12:52 host sanlock[11074]: 2014-04-03 11:12:52+0100 52174 [14808]: 0569e783 close_task_aio 1 0x7f8eec000910 busy
    Apr  3 11:12:52 host sanlock[11074]: 2014-04-03 11:12:52+0100 52174 [14808]: 0569e783 close_task_aio 2 0x7f8eec000960 busy
    Apr  3 11:12:52 host sanlock[11074]: 2014-04-03 11:12:52+0100 52174 [14808]: 0569e783 close_task_aio 3 0x7f8eec0009b0 busy
    
  • /var/log/vdsm/vdsm.log shows failures acquiring locks:

    Thread-29621::DEBUG::2014-04-03 11:21:52,000::libvirtconnection::108::libvirtconnection::(wrapper) Unknown libvirterror: ecode: 68 edom: 10 level: 2 message: Timed out during operation: cannot acquire state change lock
    
  • sar data from the host shows the following anomalies:

    • Very high iowait on specific CPUs:

      00:00:01        CPU      %usr     %nice      %sys  *%iowait*   %steal      %irq     %soft    %guest     %idle
      [...]
      11:20:01         14      0.61      0.00      0.17     82.24      0.00      0.00      0.00      2.09     14.90
      11:20:01         15      0.08      0.00      0.15     83.33      0.00      0.00      0.00      1.00     15.43
      11:20:01         19      0.18      0.00      0.58     83.45      0.00      0.00      0.00      0.00     15.78
      [...]
      11:30:01         14      0.07      0.00      0.11     99.78      0.00      0.00      0.00      0.04      0.00
      11:30:01         15      0.05      0.00      0.04     90.21      0.00      0.00      0.00      0.01      9.70
      11:30:01         19      0.26      0.00      0.63     99.11      0.00      0.00      0.00      0.00      0.00
      [...]
      11:40:01          0      0.29      0.00      0.46     99.01      0.00      0.00      0.00      0.23      0.00
      11:40:01         14      0.08      0.00      0.16     99.71      0.00      0.00      0.00      0.05      0.00
      11:40:01         19      0.30      0.00      0.68     99.03      0.00      0.00      0.00      0.00      0.00
      [...]
      11:50:01          0      0.75      0.00      0.34     56.58      0.00      0.00      0.00      0.16     42.18
      11:50:01         12      0.39      0.00      0.44     42.55      0.00      0.00      0.00      0.15     56.47
      11:50:01         14      0.09      0.00      0.15     99.75      0.00      0.00      0.00      0.01      0.00
      11:50:01         19      0.29      0.00      0.67     99.04      0.00      0.00      0.00      0.00      0.00
      [...]
      
    • Very high load:

      00:00:01      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
      [...]
      11:10:01            5       848      2.07      1.68      1.45
      11:20:01            1       934    100.20     81.61     42.47
      11:30:01            1       932    100.23     97.69     69.93
      11:40:01            1       933    100.27     99.86     84.34
      11:50:01            2       933    100.22    100.10     91.83
      
    • Very high service time svctm and utilisation from multiple shared storage devices:

      00:00:01          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
      [...]
      11:20:01     dev253-2      0.24     22.76      0.01     93.07      8.21      0.65   3461.67     84.67
      11:20:01     dev253-3     20.67      4.93   1469.77     71.35      4.20      2.27     41.39     85.54
      11:20:01     dev253-4      0.19      0.09     18.82    100.79     12.05      0.83   4477.21     84.01
      11:20:01     dev253-5      3.19      0.11    809.46    253.80     45.71      6.04    265.96     84.83
      11:20:01     dev253-6      1.62      0.09     40.40     24.94      5.93      0.36    517.08     83.94
      11:20:01    dev253-51      0.02      0.13      0.00      8.00      0.85      0.70  50494.10     84.59
      11:20:01    dev253-53      0.06     27.45      0.01    455.22      6.52      1.03  13969.64     84.25
      11:20:01    dev253-57      1.63      0.00     40.66     25.02      5.93      0.36    516.54     83.94
      11:20:01    dev253-59      0.24      0.00     38.18    156.11     18.40      0.91   3434.58     84.01
      11:20:01     dev70-48      0.06      3.97      0.00     69.68      4.11      1.09  14863.79     84.67
      11:20:01     dev70-64      5.17      1.33    364.92     70.91      1.68      3.68    164.45     84.94
      11:20:01     dev70-96      0.77      0.01    180.85    233.67     22.85      6.58   1095.26     84.77
      11:20:01    dev70-112      0.40      0.03      9.68     24.03      2.92      0.40   2078.13     83.91
      11:20:01     dev70-80      0.04      0.03      0.36     10.09      5.85      1.22  21749.57     83.81
      11:20:01    dev129-64      0.06      3.97      0.00     69.68      4.11      0.85  14851.88     84.60
      11:20:01    dev129-80      5.17      1.21    381.07     74.01      2.52      4.13    165.04     85.25
      11:20:01    dev129-96      0.04      0.01      0.39     10.43      6.20      0.26  21800.39     84.00
      11:20:01    dev129-112      0.78      0.03    197.23    254.30     22.85     14.89   1092.65     84.75
      11:20:01    dev129-128      0.40      0.03     10.11     25.20      3.01      0.42   2086.77     83.90
      [...]
      11:30:01     dev253-2      0.00      0.00      0.00      0.00     10.05      0.00      0.00    100.45
      11:30:01     dev253-3      0.00      0.00      0.00      0.00      5.02      0.00      0.00    100.45
      11:30:01     dev253-4      0.00      0.00      0.01      8.00     21.04      0.00 600007.00    100.45
      11:30:01     dev253-5      0.00      0.00      0.00      0.00     54.25      0.00      0.00    100.45
      11:30:01     dev253-6      0.00      0.00      0.03      8.00      8.04      0.00 300003.50    100.45
      11:30:01    dev253-51      0.00      0.00      0.00      0.00      1.00      0.00      0.00    100.45
      11:30:01    dev253-53      0.00      0.00      0.00      0.00      8.04      0.00      0.00    100.45
      11:30:01    dev253-57      0.00      0.00      0.03      8.00      8.04      0.00 300003.50    100.45
      11:30:01    dev253-59      0.00      0.00      0.03      8.00     32.09      0.00 300003.50    100.45
      11:30:01     dev70-48      0.00      0.00      0.00      0.00      5.02      0.00      0.00    100.45
      11:30:01     dev70-64      0.00      0.00      0.00      0.00      2.01      0.00      0.00    100.45
      11:30:01     dev70-96      0.00      0.00      0.00      0.00     27.12      0.00      0.00    100.45
      11:30:01    dev70-112      0.00      0.00      0.00      0.00      4.02      0.00      0.00    100.45
      11:30:01     dev70-80      0.00      0.00      0.00      0.00     10.05      0.00      0.00    100.45
      11:30:01    dev129-64      0.00      0.00      0.00      0.00      5.02      0.00      0.00    100.45
      11:30:01    dev129-80      0.00      0.00      0.00      0.00      3.01      0.00      0.00    100.45
      11:30:01    dev129-96      0.00      0.00      0.00      0.00     10.99      0.00      0.00    100.45
      11:30:01    dev129-112      0.00      0.00      0.00      0.00     27.12      0.00      0.00    100.45
      11:30:01    dev129-128      0.00      0.00      0.00      0.00      4.02      0.00      0.00    100.45
      11:30:01    dev253-63      0.00      0.00      0.00      0.00      5.02      0.00      0.00    100.45
      11:30:01    dev253-65      0.00      0.00      0.00      0.00     74.34      0.00      0.00    100.45
      [...]
      11:40:01     dev253-2      0.00      0.00      0.00      0.00     10.05      0.00      0.00    100.54
      11:40:01     dev253-3      0.00      0.00      0.00      0.00      5.03      0.00      0.00    100.54
      11:40:01     dev253-4      0.00      0.00      0.00      0.00     21.11      0.00      0.00    100.54
      11:40:01     dev253-5      0.00      0.00      0.00      0.00     54.29      0.00      0.00    100.54
      11:40:01     dev253-6      0.00      0.00      0.00      0.00      8.04      0.00      0.00    100.54
      11:40:01    dev253-51      0.00      0.00      0.00      0.00      1.01      0.00      0.00    100.54
      11:40:01    dev253-53      0.00      0.00      0.00      0.00      8.04      0.00      0.00    100.54
      11:40:01    dev253-57      0.00      0.00      0.00      0.00      8.04      0.00      0.00    100.54
      11:40:01    dev253-59      0.00      0.00      0.00      0.00     32.17      0.00      0.00    100.54
      11:40:01     dev70-48      0.00      0.00      0.00      0.00      5.03      0.00      0.00    100.54
      11:40:01     dev70-64      0.00      0.00      0.00      0.00      2.01      0.00      0.00    100.54
      11:40:01     dev70-96      0.00      0.00      0.00      0.00     27.15      0.00      0.00    100.54
      11:40:01    dev70-112      0.00      0.00      0.00      0.00      4.02      0.00      0.00    100.54
      11:40:01     dev70-80      0.00      0.00      0.00      0.00     10.05      0.00      0.00    100.54
      11:40:01    dev129-64      0.00      0.00      0.00      0.00      5.03      0.00      0.00    100.54
      11:40:01    dev129-80      0.00      0.00      0.00      0.00      3.02      0.00      0.00    100.54
      11:40:01    dev129-96      0.00      0.00      0.00      0.00     11.06      0.00      0.00    100.54
      11:40:01    dev129-112      0.00      0.00      0.00      0.00     27.15      0.00      0.00    100.54
      11:40:01    dev129-128      0.00      0.00      0.00      0.00      4.02      0.00      0.00    100.54
      11:40:01    dev253-63      0.00      0.00      0.00      0.00      5.03      0.00      0.00    100.54
      11:40:01    dev253-65      0.00      0.00      0.00      0.00     74.40      0.00      0.00    100.54
      [...]
      
  • Several storage related processes are reported in D state (uninterruptible sleep, usually IO):

    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    root      4435  0.0  0.0   4168   620 pts/1    D    14:29   0:00 /sbin/hdparm -g /dev/sdcw
    root      4495  0.0  0.0  23772  2620 pts/1    D    17:03   0:00 /sbin/lvm vgs -v -o +tags
    root      5117  0.0  0.0   4168   620 pts/1    D    14:30   0:00 /sbin/hdparm -g /dev/sdcw
    root      5678  0.0  0.0   4168   620 pts/1    D    14:31   0:00 /sbin/hdparm -g /dev/sdcw
    root      5697  0.0  0.0  23772  2624 pts/1    D    17:05   0:00 /sbin/lvm vgs -v -o +tags
    root      5939  0.0  0.0   4168   620 pts/1    D    14:31   0:00 /sbin/hdparm -g /dev/sdcw
    root      6312  0.0  0.0   4168   620 pts/1    D    14:32   0:00 /sbin/hdparm -g /dev/sdcw
    root      6519  0.0  0.0   4168   620 pts/1    D    14:32   0:00 /sbin/hdparm -g /dev/sdcw
    root      6979  0.0  0.0   4168   616 pts/1    D    14:33   0:00 /sbin/hdparm -g /dev/sdcw
    root      7081  0.0  0.0  23772  2624 pts/1    D    17:06   0:00 /sbin/lvm vgs -v -o +tags
    root      7315  0.0  0.0   4168   620 pts/1    D    14:33   0:00 /sbin/hdparm -g /dev/sdcw
    root      7949  0.0  0.0 199532  2376 ?        Ss   Apr02   0:01 /usr/sbin/sssd -f -D
    root      7950  7.7  0.0 294460 40812 ?        S    Apr02  94:58 /usr/libexec/sssd/sssd_be --domain LDAP --debug-to-files
    root      8478  0.0  0.0   4168   620 pts/1    D    14:36   0:00 /sbin/hdparm -g /dev/sdcw
    root      9059  0.0  0.0   4168   616 pts/1    D    14:37   0:00 /sbin/hdparm -g /dev/sdcw
    root     11482  0.0  0.0   4168   620 pts/1    D    14:42   0:00 /sbin/hdparm -g /dev/sdcw
    root     11944  0.0  0.0 120260  2416 ?        D    14:43   0:00 vgs --noheadings -o name
    root     15541  0.0  0.0   4168   620 pts/0    D    12:01   0:00 /sbin/hdparm -g /dev/sdcw
    root     28900  0.0  0.0   4168   620 pts/1    D    15:30   0:00 /sbin/hdparm -g /dev/sdcw
    root     29516  0.0  0.0   4168   624 pts/1    D    15:31   0:00 /sbin/hdparm -g /dev/sdcw
    root     41747  0.0  0.0   4168   620 pts/1    D    15:58   0:00 /sbin/hdparm -g /dev/sdcw
    root     41894  0.0  0.0  23756  2596 pts/1    D    15:59   0:00 /sbin/vgdisplay -vv
    root     42925  0.0  0.0  23772  2624 pts/1    D    15:59   0:00 /sbin/lvm vgs -v -o +tags
    root     45008  0.0  0.0  23772  2620 pts/1    D    16:03   0:00 /sbin/lvm vgs -v -o +tags
    root     47121  0.0  0.0  23772  2624 pts/1    D    16:06   0:00 /sbin/lvm vgs -v -o +tags
    root     58070  0.0  0.0  23772  2624 pts/1    D    16:34   0:00 /sbin/lvm vgs -v -o +tags
    root     62031  0.0  0.0  23892  2792 ?        D<   11:11   0:00 /sbin/lvm lvchange --config  devices { preferred_names = ["^/dev/mapper/"] 
    [...]
    vdsm     62033  0.0  0.0   8344   596 ?        D<   11:11   0:00 /bin/dd iflag=direct if=/dev/0569e783-2a74-40e4-b5dd-5a4ce4ac2dda/metadata bs=4096 count=1
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.