RHEL7 HOST VLAN Bridge slow network performance / ~5MBs over Gigabit interface

Latest response

Dear RHEL and KVM Experts,

I have a condition which I cannot be sure what is causing where its analyzes could be helpful to others.

As a reference condition I have a system as:

1) A HPE Synergy Blade HOST server installed as Virtualization environment with RHEL7.8 release and KVM/QEMU (actually not in production, or not traffic)
2) The network is configured using Bond interface in Active/Standby mode, supported for as recommended mode for virtualiaztion
3) The Blade Frame 12000 has externally two 10 Gbps interfaces, even using Active/Active mode its like that because on Trunk Mode with 4 VLANs where external switch is responsible to interconnect L2/L3
4) The HOST uses bridge interfaces with VLAN where HOST IP is under bridge VLAN interface
br0 -> bond0 -> int1+int2 (backup access port interface)
br1823-|
br1824-| -> br1 -> bond1 -> int3+int4 (application on trunk port interface)
5) All network configurations are disabling Network Manager, being handled only by network-services scripts
6) Bridge has no bridge_options, just default config, all working ok(no connectivity problems, just some strange speed changes with time)

HOST IP on br0, br1823 and br1824. It is done like that to allow VLAN to be worked on HOST. Guest can transfer in higher speeds that HOST. Guest are using virtio and vnet configurations.

vhost_net 22693 0
vhost 48851 1 vhost_net
macvtap 22757 1 vhost_net
tun 36164 2 vhost_net

Based on the bond interface on HOST as Mode 1, it is expected that we have a throughput of around 1Gbps, or around 100MBs.

On the HOST node just after network is started on HOST, transfer large files for testing reaches this speed.

But after sometime this reduces considerably to much smaller speeds. See below:

From HOST to an external server:
large.file 1% 24MB 5.0MB/s 07:40 ETA

From external server connecting on HOST and downloading:
large.file 17% 116MB 24.0MB/s 00:22 ETA

But if I just restart the network on HOST, I get the speeds expected, or around 100MBps.

After time passes the speed reduces again to the above speeds. This is very odd where I could not see any reason for it.

Some technical information, regarding the VLAN 823:

port no mac addr is local? ageing timer
1 00:1d:70:c4:f4:e6 no 0.97
1 08:f1:ea:6f:d6:c9 no 250.45
1 08:f1:ea:70:39:f1 no 0.00
1 12:d2:a6:f0:00:1f no 195.99
1 12:d2:a6:f0:00:25 no 216.92
1 16:1e:b1:80:00:12 no 65.19
1 16:1e:b1:80:00:13 no 59.18
1 16:1e:b1:80:00:1e no 36.31
1 16:1e:b1:80:00:24 yes 0.00
1 16:1e:b1:80:00:24 yes 0.00
1 2c:76:8a:55:e8:c5 no 244.12
1 2c:76:8a:56:40:15 no 120.21
1 b4:99:ba:06:71:7c no 147.64

33: br1823: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 16:1e:b1:80:00:24 brd ff:ff:ff:ff:ff:ff
inet 192.168.23.180/24 brd 192.168.23.255 scope global br1823
valid_lft forever preferred_lft forever

I am not willing to put ageing 0 on bridge, since I believe bridge should be avoiding to broadcast.

Does anyone have any recommendation to check why this behavior is happening? At this time I never left the Guest running long time but its speed under the same VLAN is always as expected, or around 100MBps.

Just to reinforce I do not have any connectivity issue just HOST speed is under what is expected for this type of connectivity.

Responses

It's a bit tricky to say without a lot more info. It could be something like incorrect CPU isolation, or perhaps some storage causing a bottleneck.

Try to test network speed with iperf to rule out the file transfer.

If that works well then try a different file transfer method, like if you are using SSH/SCP then try FTP or NFS.

Hi Jamie,

The odd thing is if I just do a systemctl restart network it works as expected.

I have all proper system sysctl parameters setup, increasing queue sizes for gigabit and so on.

I have impression something in TCP gets on wrong values, decreasing throughput. But I cannot be sure what or why.

Remember this is happening on HOST, not on GUEST.

I will try to use iperf but almost sure similar results will be seen.

Hi Jamie and all,

Adding some more details.

I have actually one GUEST running RHEL7.8 with proper sysctl adjustments for gigabit interfaces, similar to its own HOST. The VMs are interfacing over bridges which the subnet 192.168.23.0/24 is running over VLAN 823.

Although HOST has a transfer rate over the exactly same physical infrastructure much slower. But this only happens after sometime the network is up. Right after network restart the performance behaves as it should be.

See below: HOST to Baremetal server on same L2 subnet large.file 16% 111MB 22.7MB/s 00:24 ETA^CKilled by signal 2.

Actul Values - optimized for Gigabit on HOST net.ipv4.tcp_window_scaling = 1 net.core.rmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216

Baremetal to GUEST large.file 100% 2320MB 112.0MB/s 00:20

GUEST to Baremetal large.file 100% 2320MB 111.9MB/s 00:20

net.ipv4.tcp_window_scaling = 1 net.core.rmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:cd:ed:26 brd ff:ff:ff:ff:ff:ff inet 192.168.23.161/24 brd 192.168.23.255 scope global eth1 valid_lft forever preferred_lft forever

On HOST we also have: 57: br1823: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 16:1e:b1:80:00:30 brd ff:ff:ff:ff:ff:ff inet 192.168.23.182/24 brd 192.168.23.255 scope global br1823 valid_lft forever preferred_lft forever

68: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN group default qlen 1000 link/ether fe:54:00:4e:06:8f brd ff:ff:ff:ff:ff:ff 69: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br1823 state UNKNOWN group default qlen 1000 link/ether fe:54:00:cd:ed:26 brd ff:ff:ff:ff:ff:ff 70: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br1824 state UNKNOWN group default qlen 1000 link/ether fe:54:00:97:50:ec brd ff:ff:ff:ff:ff:ff

Where we can notice vnet1 is attached with br1823.

Then on HOST I do: systemctl restart network.service

HOST to Baremetal server on same L2 subnet large.file 100% 656MB 109.3MB/s 00:06

See it then worked as expected.

I do not see any TCP issues, like buffer or window. I assume the bridge is ok, since if not the behavior would be similar to GUEST.

My direction is under some TCP or other sysctl setup. But I cannot understand why this happens.

Anyone has any directions for checks?

Both HOST and GUEST running RHEL 7.8.