hw csum failure on Cisco UCS using Openvswitch

Solution Verified - Updated -

Issue

  • We have an OpenStack HA install and whatever control nodes currently host Neutron services output the following errors on /var/log/messages at an alarming rate.
kernel: qg-f7cc81ab-90: hw csum failure
kernel: tap22fa8278-8c: hw csum failure

The traces seen in /var/log/messages is:

Nov 24 14:08:13 controller1 kernel: tap22fa8278-8c: hw csum failure
Nov 24 14:08:13 controller1 kernel: CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.0-123.9.2.el7.x86_64 #1
Nov 24 14:08:13 controller1 kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
Nov 24 14:08:13 controller1 kernel: ffff881044760000 20b9a7e010e08381 ffff88085fd23b10 ffffffff815e241b
Nov 24 14:08:13 controller1 kernel: ffff88085fd23b28 ffffffff814cd56a ffff880852cd0100 ffff88085fd23b48
Nov 24 14:08:13 controller1 kernel: ffffffff814c62d2 000000005fd23b80 ffff880852cd0100 ffff88085fd23b58
Nov 24 14:08:13 controller1 kernel: Call Trace:
Nov 24 14:08:13 controller1 kernel: <IRQ>  [<ffffffff815e241b>] dump_stack+0x19/0x1b
Nov 24 14:08:13 controller1 kernel: [<ffffffff814cd56a>] netdev_rx_csum_fault+0x3a/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff814c62d2>] __skb_checksum_complete_head+0x62/0x70
Nov 24 14:08:13 controller1 kernel: [<ffffffff814c62f1>] __skb_checksum_complete+0x11/0x20
Nov 24 14:08:13 controller1 kernel: [<ffffffff8155a2ac>] nf_ip_checksum+0xcc/0x100
Nov 24 14:08:13 controller1 kernel: [<ffffffffa0351b43>] udp_error+0x103/0x200 [nf_conntrack]
Nov 24 14:08:13 controller1 kernel: [<ffffffff810977f2>] ? default_wake_function+0x12/0x20
Nov 24 14:08:13 controller1 kernel: [<ffffffffa034b330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
Nov 24 14:08:13 controller1 kernel: [<ffffffff810fe958>] ? __call_rcu_nocb_enqueue+0xa8/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffffa0368302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
Nov 24 14:08:13 controller1 kernel: [<ffffffff8150066a>] nf_iterate+0xaa/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff81500704>] nf_hook_slow+0x84/0x140
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509e74>] ip_rcv+0x344/0x380
Nov 24 14:08:13 controller1 kernel: [<ffffffff814cffa6>] __netif_receive_skb_core+0x676/0x870
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d01b8>] __netif_receive_skb+0x18/0x60
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d0d7e>] process_backlog+0xae/0x180
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d060a>] net_rx_action+0x15a/0x250
Nov 24 14:08:13 controller1 kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
Nov 24 14:08:13 controller1 kernel: [<ffffffff815f445c>] call_softirq+0x1c/0x30
Nov 24 14:08:13 controller1 kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
Nov 24 14:08:13 controller1 kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
Nov 24 14:08:13 controller1 kernel: [<ffffffff815f4d58>] do_IRQ+0x58/0xf0
Nov 24 14:08:13 controller1 kernel: [<ffffffff815e9ead>] common_interrupt+0x6d/0x6d
Nov 24 14:08:13 controller1 kernel: <EOI>  [<ffffffff814835af>] ? cpuidle_enter_state+0x4f/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff814836e5>] cpuidle_idle_call+0xc5/0x200
Nov 24 14:08:13 controller1 kernel: [<ffffffff8101bcae>] arch_cpu_idle+0xe/0x30
Nov 24 14:08:13 controller1 kernel: [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
Nov 24 14:08:13 controller1 kernel: [<ffffffff815d0271>] start_secondary+0x265/0x27b

and

Nov 24 14:08:13 controller1 kernel: qg-f7cc81ab-90: hw csum failure
Nov 24 14:08:13 controller1 kernel: CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.0-123.9.2.el7.x86_64 #1
Nov 24 14:08:13 controller1 kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
Nov 24 14:08:13 controller1 kernel: ffff88104f68c000 20b9a7e010e08381 ffff88085fd23b10 ffffffff815e241b
Nov 24 14:08:13 controller1 kernel: ffff88085fd23b28 ffffffff814cd56a ffff8807908b6f00 ffff88085fd23b48
Nov 24 14:08:13 controller1 kernel: ffffffff814c62d2 0000000010e08381 ffff8807908b6f00 ffff88085fd23b58
Nov 24 14:08:13 controller1 kernel: Call Trace:
Nov 24 14:08:13 controller1 kernel: <IRQ>  [<ffffffff815e241b>] dump_stack+0x19/0x1b
Nov 24 14:08:13 controller1 kernel: [<ffffffff814cd56a>] netdev_rx_csum_fault+0x3a/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff814c62d2>] __skb_checksum_complete_head+0x62/0x70
Nov 24 14:08:13 controller1 kernel: [<ffffffff814c62f1>] __skb_checksum_complete+0x11/0x20
Nov 24 14:08:13 controller1 kernel: [<ffffffff8155a2ac>] nf_ip_checksum+0xcc/0x100
Nov 24 14:08:13 controller1 kernel: [<ffffffffa0351b43>] udp_error+0x103/0x200 [nf_conntrack]
Nov 24 14:08:13 controller1 kernel: [<ffffffffa03c6ffd>] ? ovs_vport_send+0x1d/0x80 [openvswitch]
Nov 24 14:08:13 controller1 kernel: [<ffffffffa034b330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffffa0368302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
Nov 24 14:08:13 controller1 kernel: [<ffffffff8150066a>] nf_iterate+0xaa/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff81500704>] nf_hook_slow+0x84/0x140
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509e74>] ip_rcv+0x344/0x380
Nov 24 14:08:13 controller1 kernel: [<ffffffff814cffa6>] __netif_receive_skb_core+0x676/0x870
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d01b8>] __netif_receive_skb+0x18/0x60
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d0d7e>] process_backlog+0xae/0x180
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d060a>] net_rx_action+0x15a/0x250
Nov 24 14:08:13 controller1 kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
Nov 24 14:08:13 controller1 kernel: [<ffffffff815f445c>] call_softirq+0x1c/0x30
Nov 24 14:08:13 controller1 kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
Nov 24 14:08:13 controller1 kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
Nov 24 14:08:13 controller1 kernel: [<ffffffff815f4d58>] do_IRQ+0x58/0xf0
Nov 24 14:08:13 controller1 kernel: [<ffffffff815e9ead>] common_interrupt+0x6d/0x6d
Nov 24 14:08:13 controller1 kernel: <EOI>  [<ffffffff814835af>] ? cpuidle_enter_state+0x4f/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff814836e5>] cpuidle_idle_call+0xc5/0x200
Nov 24 14:08:13 controller1 kernel: [<ffffffff8101bcae>] arch_cpu_idle+0xe/0x30
Nov 24 14:08:13 controller1 kernel: [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
Nov 24 14:08:13 controller1 kernel: [<ffffffff815d0271>] start_secondary+0x265/0x27b

Environment

  • Red Hat Enterprise Linux Openstack Platform 5.0 on RHEL 7.0
  • Cisco UCS B200M3 blades

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content