Question on Network Performance Optimization

Latest response

Right now, I'm trying to set up a fast network backup solution. I've got a couple of backup servers configured to use a nearline disk storage array via NFS.

 

The backup servers have a 10Gbps failover interface-group (bonding mode 1) to handle client data ingest. They also have four, 1Gbps interfaces - currently configured into a LACP interface-group (bonding mode 4) - connected to the nearline storage device's network. That is, ingest is on one network segment and backup storage is on a different segment with the backup servers bridging the two networks.

 

My nearline storage device has two 10Gbps interfaces. They're currently configured into a failover interface-group. I could conceivably change them to be two, independent NICs (each with their own IP but on the same network segment) and then use a director application to handle individual path failures.

 

What I need to sort out is:

  • is LACP my best option for the backup servers' 1Gbps interface-grouping?
  • is acitve-passive failover my best performance option for my nearline device?

Additional data points:

  • the NFS network, while using 10Gbps switching infrastructure has been rate-limited to 5Gbps.
  • I've got two backup servers with identical asymmetrical network configuration.

Any practical experience anyone can bring to bear or am I stuck twiddling knobs to find the most optimal data-flow configuration?

Responses

Is the connectivity between the backup servers and the nearline storage device(s) via a switch, or direct? If they are or can be direct, you will experience better performance using bonding mode=0 (balance-rr/pure round-robin). LACP with layer 2+3 hashing will force all traffic between a MAC/IP pair via a single interface, making having multiple interfaces less useful if you have a small number of hosts that the backup server is communicating with.

Its worth reading about the xmit_hash_policy in good detail if you will be using LACP in production for load balancing.

--snip--
From: http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding
Slave selection for outgoing traffic is done according to the transmit hash policy, which may be changed from the default simple XOR policy via the xmit_hash_policy option, documented below. Note that not all transmit policies may be 802.3ad compliant, particularly in regards to the packet mis-ordering requirements of section 43.2.4 of the 802.3ad standard. Differing peer implementations will have varying tolerances for noncompliance.
--snip--

Ultimately (we're a year+ on when I made the original post), on our 10Gbps-enabled networks, we opted to do 10/1 A/P pairs for client data-ingest and backup image data-writes. The level of configurational complexity associated with LACP versus both the theoretical and practical gains in throughput just made LACP not worth the effort in a switched network.

 

For our 1Gbps networks, we could have opted to use LACP for the client-facing interface(s), but given the configurational realities associated with the links between the backup servers and the NAS devices, it would have only created a situation where client ingest was happening faster than array-writes could happen.

+1 that round-robin will probably provide better throughput in a single host-to-host scenario like this.

 

Another thing to keep in mind with LACP is that the balancing profile on the switch differs from device to device too. For example, an old Cisco 2950 can only do Layer 2 hashing, but newer and more powerful switches can incorporate Layer 3 and/or Layer 4 into the loadbalancing algorithm.