Azure VMs abruptly enter hang state after encountering "Data path switched from VF" instances

Solution In Progress - Updated -

Issue

  • Application is hosted on Azure VM and is accessing NFS fileshares. The Azure VM abruptly enters a freeze/hang state which is affecting the application using the NFS shares.
[1490379.290487] mlx4_en: enP1p0s2: Close port called             < 
[1490379.415167] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: Data path switched from VF: enP1p0s2          <<
[1490379.503217] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: VF unregistering: enP1p0s2
[1490389.059115] hv_pci 7609dc51-cbc9-4b0b-8d50-9711c3549793: PCI VMBus probing: Using version 0x10002
[1490389.217677] ACPI: \: failed to evaluate _DSM (0x1001)
[1490389.364197] hv_pci 7609dc51-cbc9-4b0b-8d50-9711c3549793: PCI host bridge to bus 0001:00
[1490389.520723] pci_bus 0001:00: root bus resource [mem 0xfe0800000-0xfe0ffffff window]
[1490389.680730] pci 0001:00:02.0: [15b3:1004] type 00 class 0x020000
[1490389.735429] pci 0001:00:02.0: reg 0x18: [mem 0xfe0800000-0xfe0ffffff 64bit pref]
[1490389.798560] pci 0001:00:02.0: BAR 2: assigned [mem 0xfe0800000-0xfe0ffffff 64bit pref]
[1490389.969388] mlx4_core: Initializing 0001:00:02.0
[1490390.129549] mlx4_core 0001:00:02.0: Detected virtual function - running in slave mode
[1490390.293467] mlx4_core 0001:00:02.0: Sending reset                                         < 

[1490391.007207] mlx4_en 0001:00:02.0: Activating port:1
[1490391.268283] mlx4_en: 0001:00:02.0: Port 1: Using 32 TX rings
[1490391.424273] mlx4_en: 0001:00:02.0: Port 1: Using 16 RX rings
[1490391.576392] mlx4_en: 0001:00:02.0: Port 1: Initializing port
[1490391.728499] mlx4_core 0001:00:02.0 eth1: joined to eth0
[1490391.886106] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: VF registering: eth1
[1490392.077952] mlx4_en: eth1: Steering Mode 2 
[1490392.304254] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: Data path switched to VF: eth1        << 
[1490392.332503] mlx4_en: eth1: Link Up

[1154892.255641] mlx4_en: enP1p0s2: Close port called
[1154892.358289] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: Data path switched from VF: enP1p0s2   <<
[1154892.371302] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: VF unregistering: enP1p0s2             <<
...

[1154901.058073] mlx4_core 0001:00:02.0 eth1: joined to eth0
[1154901.066190] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: VF registering: eth1             <<
[1154901.179496] mlx4_en: enP1p0s2: Steering Mode 2
[1154901.276867] mlx4_en: enP1p0s2: Link Up
[1154901.276880] hv_netvsc 000d3a9d-a1e0-000d-3a9d-a1e0000d3a9d eth0: Data path switched to VF: enP1p0s2    << 
[1212613.298023] ------------[ cut here ]------------
[1212613.417225] WARNING: CPU: 39 PID: 44585 at net/sunrpc/sched.c:716 rpc_exit_task+0x8e/0x90 [sunrpc]
[1212613.543461] Modules linked in: unix_diag binfmt_misc nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner iptable_security xt_conntrack nf_conntrack ip_tables ext4 mbcache jbd2 ip_set nfnetlink bridge stp llc dm_mirror dm_region_hash dm_log raid0 dm_raid mlx4_en nvme raid456 async_raid6_recov async_memcpy nvme_core mlx4_core devlink async_pq raid6_pq async_xor xor crc32_pclmul async_tx ghash_clmulni_intel aesni_intel lrw joydev dm_mod gf128mul glue_helper ablk_helper i2c_piix4 hv_utils cryptd ptp sg i2c_core pcspkr hyperv_keyboard hyperv_fb pps_core hv_balloon hid_hyperv pci_hyperv xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom ata_generic pata_acpi hv_storvsc scsi_transport_fc hv_netvsc scsi_tgt ata_piix
[1212614.303637]  crct10dif_pclmul crct10dif_common libata crc32c_intel floppy serio_raw hv_vmbus [last unloaded: ip_tables]
[1212614.584837] CPU: 39 PID: 44585 Comm: kworker/39:2 Not tainted 3.10.0-862.43.1.el7.x86_64 #1
[1212614.730275] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[1212614.887222] Workqueue: rpciod rpc_async_schedule [sunrpc]
[1212615.036347] Call Trace:
[1212615.176413]  [<ffffffffbdd14a74>] dump_stack+0x19/0x1b
[1212615.321122]  [<ffffffffbd6946d8>] __warn+0xd8/0x100
[1212615.465768]  [<ffffffffc091ad20>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
[1212615.616655]  [<ffffffffc091ad20>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
[1212615.765975]  [<ffffffffbd69481d>] warn_slowpath_null+0x1d/0x20
[1212615.913481]  [<ffffffffc091adae>] rpc_exit_task+0x8e/0x90 [sunrpc]
[1212616.061444]  [<ffffffffc091c369>] __rpc_execute+0x99/0x420 [sunrpc]
[1212616.210286]  [<ffffffffbdd19b18>] ? __schedule+0x448/0x9d0
[1212616.357362]  [<ffffffffc091c702>] rpc_async_schedule+0x12/0x20 [sunrpc]
[1212616.503574]  [<ffffffffbd6b628f>] process_one_work+0x17f/0x440
[1212616.650741]  [<ffffffffbd6b7326>] worker_thread+0x126/0x3c0
[1212616.796469]  [<ffffffffbd6b7200>] ? manage_workers.isra.24+0x2a0/0x2a0
[1212616.948826]  [<ffffffffbd6be071>] kthread+0xd1/0xe0
[1212617.097640]  [<ffffffffbd6bdfa0>] ? insert_kthread_work+0x40/0x40
[1212617.248813]  [<ffffffffbdd26624>] ret_from_fork_nospec_begin+0xe/0x21
[1212617.403469]  [<ffffffffbd6bdfa0>] ? insert_kthread_work+0x40/0x40
[1212617.555151] ---[ end trace 60a10a7573284b71 ]---
[1212626.108576] nr_pdflush_threads exported in /proc is scheduled for removal
[1212635.203303] MACsec IEEE 802.1AE
[1212649.950599] traps: cpupower[37896] trap divide error ip:4024f0 sp:7ffe9d952bf0 error:0 in cpupower[400000+f000]

.....

Environment

  • RHEL 7.7
    kernel 3.10.0-862.43.1.el7.x86_64
  • Azure VMs
  • NFS

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content