Azure VMs abruptly enter hang state after encountering "Data path switched from VF" instances

Solution In Progress - Updated -

Issue

  • Application is hosted on Azure VM and is accessing NFS fileshares. The Azure VM abruptly enters a freeze/hang state which is affecting the application using the NFS shares.
[1490379.290487] mlx4_en: enP1p0s2: Close port called             < 
[1490379.415167] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: Data path switched from VF: enP1p0s2          <<
[1490379.503217] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: VF unregistering: enP1p0s2
[1490389.059115] hv_pci 7609dc51-cbc9-4b0b-8d50-9711c3549793: PCI VMBus probing: Using version 0x10002
[1490389.217677] ACPI: \: failed to evaluate _DSM (0x1001)
[1490389.364197] hv_pci 7609dc51-cbc9-4b0b-8d50-9711c3549793: PCI host bridge to bus 0001:00
[1490389.520723] pci_bus 0001:00: root bus resource [mem 0xfe0800000-0xfe0ffffff window]
[1490389.680730] pci 0001:00:02.0: [15b3:1004] type 00 class 0x020000
[1490389.735429] pci 0001:00:02.0: reg 0x18: [mem 0xfe0800000-0xfe0ffffff 64bit pref]
[1490389.798560] pci 0001:00:02.0: BAR 2: assigned [mem 0xfe0800000-0xfe0ffffff 64bit pref]
[1490389.969388] mlx4_core: Initializing 0001:00:02.0
[1490390.129549] mlx4_core 0001:00:02.0: Detected virtual function - running in slave mode
[1490390.293467] mlx4_core 0001:00:02.0: Sending reset                                         < 

[1490391.007207] mlx4_en 0001:00:02.0: Activating port:1
[1490391.268283] mlx4_en: 0001:00:02.0: Port 1: Using 32 TX rings
[1490391.424273] mlx4_en: 0001:00:02.0: Port 1: Using 16 RX rings
[1490391.576392] mlx4_en: 0001:00:02.0: Port 1: Initializing port
[1490391.728499] mlx4_core 0001:00:02.0 eth1: joined to eth0
[1490391.886106] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: VF registering: eth1
[1490392.077952] mlx4_en: eth1: Steering Mode 2 
[1490392.304254] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: Data path switched to VF: eth1        << 
[1490392.332503] mlx4_en: eth1: Link Up

[1154892.255641] mlx4_en: enP1p0s2: Close port called
[1154892.358289] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: Data path switched from VF: enP1p0s2   <<
[1154892.371302] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: VF unregistering: enP1p0s2             <<
...

[1154901.058073] mlx4_core 0001:00:02.0 eth1: joined to eth0
[1154901.066190] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: VF registering: eth1             <<
[1154901.179496] mlx4_en: enP1p0s2: Steering Mode 2
[1154901.276867] mlx4_en: enP1p0s2: Link Up
[1154901.276880] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: Data path switched to VF: enP1p0s2    << 
[1212613.298023] ------------[ cut here ]------------
[1212613.417225] WARNING: CPU: 39 PID: 44585 at net/sunrpc/sched.c:716 rpc_exit_task+0x8e/0x90 [sunrpc]
[1212613.543461] Modules linked in: unix_diag binfmt_misc nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner iptable_security xt_conntrack nf_conntrack ip_tables ext4 mbcache jbd2 ip_set nfnetlink bridge stp llc dm_mirror dm_region_hash dm_log raid0 dm_raid mlx4_en nvme raid456 async_raid6_recov async_memcpy nvme_core mlx4_core devlink async_pq raid6_pq async_xor xor crc32_pclmul async_tx ghash_clmulni_intel aesni_intel lrw joydev dm_mod gf128mul glue_helper ablk_helper i2c_piix4 hv_utils cryptd ptp sg i2c_core pcspkr hyperv_keyboard hyperv_fb pps_core hv_balloon hid_hyperv pci_hyperv xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom ata_generic pata_acpi hv_storvsc scsi_transport_fc hv_netvsc scsi_tgt ata_piix
[1212614.303637]  crct10dif_pclmul crct10dif_common libata crc32c_intel floppy serio_raw hv_vmbus [last unloaded: ip_tables]
[1212614.584837] CPU: 39 PID: 44585 Comm: kworker/39:2 Not tainted 3.10.0-862.43.1.el7.x86_64 #1
[1212614.730275] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[1212614.887222] Workqueue: rpciod rpc_async_schedule [sunrpc]
[1212615.036347] Call Trace:
[1212615.176413]  [<ffffffffbdd14a74>] dump_stack+0x19/0x1b
[1212615.321122]  [<ffffffffbd6946d8>] __warn+0xd8/0x100
[1212615.465768]  [<ffffffffc091ad20>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
[1212615.616655]  [<ffffffffc091ad20>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
[1212615.765975]  [<ffffffffbd69481d>] warn_slowpath_null+0x1d/0x20
[1212615.913481]  [<ffffffffc091adae>] rpc_exit_task+0x8e/0x90 [sunrpc]
[1212616.061444]  [<ffffffffc091c369>] __rpc_execute+0x99/0x420 [sunrpc]
[1212616.210286]  [<ffffffffbdd19b18>] ? __schedule+0x448/0x9d0
[1212616.357362]  [<ffffffffc091c702>] rpc_async_schedule+0x12/0x20 [sunrpc]
[1212616.503574]  [<ffffffffbd6b628f>] process_one_work+0x17f/0x440
[1212616.650741]  [<ffffffffbd6b7326>] worker_thread+0x126/0x3c0
[1212616.796469]  [<ffffffffbd6b7200>] ? manage_workers.isra.24+0x2a0/0x2a0
[1212616.948826]  [<ffffffffbd6be071>] kthread+0xd1/0xe0
[1212617.097640]  [<ffffffffbd6bdfa0>] ? insert_kthread_work+0x40/0x40
[1212617.248813]  [<ffffffffbdd26624>] ret_from_fork_nospec_begin+0xe/0x21
[1212617.403469]  [<ffffffffbd6bdfa0>] ? insert_kthread_work+0x40/0x40
[1212617.555151] ---[ end trace 60a10a7573284b71 ]---
[1212626.108576] nr_pdflush_threads exported in /proc is scheduled for removal
[1212635.203303] MACsec IEEE 802.1AE
[1212649.950599] traps: cpupower[37896] trap divide error ip:4024f0 sp:7ffe9d952bf0 error:0 in cpupower[400000+f000]

.....

Environment

  • RHEL 7.7
    kernel 3.10.0-862.43.1.el7.x86_64
  • Azure VMs
  • NFS

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In