What value should I set for the tcp_retries2 parameter?

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Database connection or other internal-only network

Issue

The /proc/sys/net/ipv4/tcp_retries2 setting controls how many retries TCP makes on data segments. Some guides suggest changing it to a much lower value than the default. Why should I do this?

Resolution

In a High Availability (HA) situation consider decreasing the setting to 3.

Root Cause

The value for this parameter is a balance between detecting a dead connection quickly (which may be of value in an HA setting) and reliability in the face of a lossy network (eg. overloaded switch, or the wider Internet). No one value is perfect for all situations. HA configurations with well-provisioned networks AND systems, and only internal traffic, may well choose lower values. General-purpose ones and configurations with only one server (non-HA) should not. Communications on extremely lossy or overloaded networks may benefit from a higher value than the Red Hat default.

The parameter controls the total time before a connection failure is declared. It is difficult to calculate the time value corresponding to the count of retries set here as it depends on the response time dynamically measured for each TCP connection.

Too low a setting will result in TCP connections apparently failing during load peaks.

There is no point in setting a low value unless TCP connection failure is a primary means of declaring a cluster node failure, or for most non-HA situations.

RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8.
Oracle suggest a value of 3 for a RAC configuration.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.