Transmission of packets on an ESX guest is delayed due to a race condition.
Environment
- Red Hat Enterprise Linux 7.2
- kernel-3.10.0-327.49.2.el7.x86_64
- vmxnet3: 1.3.5.0-k-NAPI
- ESXi 6.0 u2/6.5.0
Issue
- When network coalescing is disabled, there is a possibility of a race under heavy
load, which results in xmit hangs.
ethernet0.coalescingScheme = "disabled"
Resolution
- Update to one of the following kernels to avoid this issue:
RHEL Release | Errata | Kernel version |
---|---|---|
RHEL 7.6 | RHSA-2018:3083 | kernel-3.10.0-957.el7 |
RHEL 7.5.z | RHSA-2018:1965 | kernel-3.10.0-862.6.3.el7 |
RHEL 7.4.z EUS | RHSA-2018:1738 | kernel-3.10.0-693.33.1.el7 |
RHEL 7.3.z EUS | RHSA-2018:1737 | kernel-3.10.0-514.51.1.el7 |
Root Cause
- The field txNumDeferred is used by the driver to keep track of the number
of packets it has pushed to the emulation. The driver increments it on
pushing the packet to the emulation and the emulation resets it to 0 at
the end of the transmit. -
There is a possibility of a race either when (a) ESX is under heavy load or
(b) workload inside VM is of low packet rate. -
This change creates a local copy of txNumDeferred and uses it to perform ring
arithmetic.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments