Oracle 2-node RAC private network send and receive packet counts don't match

Posted on

A 2-node Oracle RAC database cluster, running RHEL9.4. The cluster interconnect private network should only have network packets sent to and received from the partner node. But the send and receive numbers don't match. We run the command 'ip -s a s ' every 30 minutes. That is, in the following output

RX: bytes packets errors dropped missed mcast
6710076132263 8274774952 0 0 0 33480
TX: bytes packets errors dropped carrier collsns
5043237313859 7350801074 0 0 0 0

we record the RX (receive) and TX (transmit or send) packets in a log.

Here's node A's log (the middle column is RX and the last column TX):

20240926-020001: 27604459098 30630728317
20240926-023001: 27655053622 30658003361
20240926-030001: 27676057477 30699619452

and node B's log:

20240926-020001: 8059447742 7184664847
20240926-023001: 8086722684 7235249553
20240926-030001: 8128331509 7256266139

From 2 AM to 3 AM, node A's RX increased by 27676057477-27604459098=71598379, and node B's TX increased by 7256266139-7184664847=71601292. There's a difference of 71601292-71598379=2913, which is the number of packets node B transmitted but node A did not receive. During this one-hour period, node A's TX increased by 30699619452-30630728317=68891135, and node B's RX increased by 8128331509-8059447742=68883767. There are 68891135-68883767=7368 packets node A transmitted but node B didn't receive. Is this simply because the command 'ip -s' (or the obsolete 'ifconfig') records the number of TX packets that are sent regardless whether the other end receives? Since the command on our servers always shows 0 errors, 0 dropped, and 0 missed, maybe we can use the magnitude of the difference between TX and the partner's RX as a metric to gauge the quality of the network interface on a 2-node cluster. Any comment is welcome.

Responses