Transmission of packets on an ESX guest is delayed due to a race condition.

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7.2
  • kernel-3.10.0-327.49.2.el7.x86_64
  • vmxnet3: 1.3.5.0-k-NAPI
  • ESXi 6.0 u2/6.5.0

Issue

  • When network coalescing is disabled, there is a possibility of a race under heavy
    load, which results in xmit hangs.
ethernet0.coalescingScheme = "disabled"

Resolution

  • Update to one of the following kernels to avoid this issue:
RHEL Release Errata Kernel version
RHEL 7.6 RHSA-2018:3083 kernel-3.10.0-957.el7
RHEL 7.5.z RHSA-2018:1965 kernel-3.10.0-862.6.3.el7
RHEL 7.4.z EUS RHSA-2018:1738 kernel-3.10.0-693.33.1.el7
RHEL 7.3.z EUS RHSA-2018:1737 kernel-3.10.0-514.51.1.el7

Root Cause

  • The field txNumDeferred is used by the driver to keep track of the number
    of packets it has pushed to the emulation. The driver increments it on
    pushing the packet to the emulation and the emulation resets it to 0 at
    the end of the transmit.
  • There is a possibility of a race either when (a) ESX is under heavy load or
    (b) workload inside VM is of low packet rate.

  • This change creates a local copy of txNumDeferred and uses it to perform ring
    arithmetic.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments