Why do I see Received Packet Steering (RPS) queue errors when using a Mellanox ConnectX EN 10 Gigabit Ethernet card?

Updated -

Issue

Why am I seeing "ethX received packet on queue Y, but number of RX queues is Z" error messages in my /var/log/messages file?

Environment

  • Red Hat Enterprise Linux
  • A Mellanox ConnectX EN 10 Gigabit Ethernet card

Resolution

This error is not fatal and does not pose a risk to the operation of your system. It is caused by a bug in the Mellanox driver. A fix for the issue is being investigated for inclusion in a future update release of Red Hat Enterprise Linux 6. You can continue using your system even with this message present.

Background Information

On occasion, a system might register the following warning in the /var/log/messages file when using a  Mellanox ConnectX 10 Gigabit Ethernet card.

ethX received packet on queue Y, but number of RX queues is Z

Where X, Y and Z are integer values. This warning is informative and may require contacting support, but is in no way fatal to system functionality.

The nature of this warning stems from the use of Receive Packet Steering (RPS). When this feature is in use, if your system is using a network card that has multiple receive queues and records the fact that a network buffer was received on queue Y, when only Z queues have been registered with the operating system, this warning will be generated. The result is that the RPS functionality will not work on this system for that packet, but the frame will still be received normally so this warning is not cause for immediate alarm.

It is indicative of a bug in the driver for the Mellanox NIC hardware. Please contact support if you have any questions. Again, this warning is not fatal, does not pose a risk to the operation of your system and a fix for the issue is being investigated for inclusion in a future update release of Red Hat Enterprise Linux 6.

This error can cause a tainted kernel (value 512).The 512 value actually indicates a kernel warning has occurred; in this case an infrequently used portion of code is called.

Here's an example of a full error message:

------------[ cut here ]------------

WARNING: at net/core/dev.c:2157 get_rps_cpu+0x140/0x3b0() (Not tainted)
Hardware name: Testbox
eth0 received packet on queue 11, but number of RX queues is 8
Modules linked in: mlx4_en ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_co
Pid: 0, comm: swapper Not tainted 2.6.32-131.0.15.el6.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff81067137>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff81067226>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff8141d760>] ? get_rps_cpu+0x140/0x3b0
 [<ffffffff81420019>] ? netif_receive_skb+0x29/0x60
 [<ffffffffa021a8ca>] ? mlx4_en_process_rx_cq+0x3ca/0x830 [mlx4_en]
 [<ffffffffa021ad6f>] ? mlx4_en_poll_rx_cq+0x3f/0x80 [mlx4_en]
 [<ffffffffa01813a2>] ? mlx4_cq_completion+0x42/0x80 [mlx4_core]
 [<ffffffff814225a3>] ? net_rx_action+0x103/0x2f0
 [<ffffffff8106f717>] ? __do_softirq+0xb7/0x1e0
 [<ffffffff810d6940>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
 [<ffffffff8100df05>] ? do_softirq+0x65/0xa0
 [<ffffffff8106f505>] ? irq_exit+0x85/0x90
 [<ffffffff814e3505>] ? do_IRQ+0x75/0xf0
 [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff810362ab>] ? native_safe_halt+0xb/0x10
 [<ffffffff810142fd>] ? default_idle+0x4d/0xb0
 [<ffffffff81009e96>] ? cpu_idle+0xb6/0x110
 [<ffffffff814c376a>] ? rest_init+0x7a/0x80
 [<ffffffff81bbdf28>] ? start_kernel+0x41d/0x429
 [<ffffffff81bbd33a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81bbd438>] ? x86_64_start_kernel+0xfa/0x109
---[ end trace 64ee7ed6359f4c81 ]---

Comments