hw csum failure on Cisco UCS using Openvswitch

Solution Verified - Updated -

Issue

  • We have an OpenStack HA install and whatever control nodes currently host Neutron services output the following errors on /var/log/messages at an alarming rate.
kernel: qg-f7cc81ab-90: hw csum failure
kernel: tap22fa8278-8c: hw csum failure

The traces seen in /var/log/messages is:

Nov 24 14:08:13 controller1 kernel: tap22fa8278-8c: hw csum failure
Nov 24 14:08:13 controller1 kernel: CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.0-123.9.2.el7.x86_64 #1
Nov 24 14:08:13 controller1 kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
Nov 24 14:08:13 controller1 kernel: ffff881044760000 20b9a7e010e08381 ffff88085fd23b10 ffffffff815e241b
Nov 24 14:08:13 controller1 kernel: ffff88085fd23b28 ffffffff814cd56a ffff880852cd0100 ffff88085fd23b48
Nov 24 14:08:13 controller1 kernel: ffffffff814c62d2 000000005fd23b80 ffff880852cd0100 ffff88085fd23b58
Nov 24 14:08:13 controller1 kernel: Call Trace:
Nov 24 14:08:13 controller1 kernel: <IRQ>  [<ffffffff815e241b>] dump_stack+0x19/0x1b
Nov 24 14:08:13 controller1 kernel: [<ffffffff814cd56a>] netdev_rx_csum_fault+0x3a/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff814c62d2>] __skb_checksum_complete_head+0x62/0x70
Nov 24 14:08:13 controller1 kernel: [<ffffffff814c62f1>] __skb_checksum_complete+0x11/0x20
Nov 24 14:08:13 controller1 kernel: [<ffffffff8155a2ac>] nf_ip_checksum+0xcc/0x100
Nov 24 14:08:13 controller1 kernel: [<ffffffffa0351b43>] udp_error+0x103/0x200 [nf_conntrack]
Nov 24 14:08:13 controller1 kernel: [<ffffffff810977f2>] ? default_wake_function+0x12/0x20
Nov 24 14:08:13 controller1 kernel: [<ffffffffa034b330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
Nov 24 14:08:13 controller1 kernel: [<ffffffff810fe958>] ? __call_rcu_nocb_enqueue+0xa8/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffffa0368302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
Nov 24 14:08:13 controller1 kernel: [<ffffffff8150066a>] nf_iterate+0xaa/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff81500704>] nf_hook_slow+0x84/0x140
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509e74>] ip_rcv+0x344/0x380
Nov 24 14:08:13 controller1 kernel: [<ffffffff814cffa6>] __netif_receive_skb_core+0x676/0x870
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d01b8>] __netif_receive_skb+0x18/0x60
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d0d7e>] process_backlog+0xae/0x180
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d060a>] net_rx_action+0x15a/0x250
Nov 24 14:08:13 controller1 kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
Nov 24 14:08:13 controller1 kernel: [<ffffffff815f445c>] call_softirq+0x1c/0x30
Nov 24 14:08:13 controller1 kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
Nov 24 14:08:13 controller1 kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
Nov 24 14:08:13 controller1 kernel: [<ffffffff815f4d58>] do_IRQ+0x58/0xf0
Nov 24 14:08:13 controller1 kernel: [<ffffffff815e9ead>] common_interrupt+0x6d/0x6d
Nov 24 14:08:13 controller1 kernel: <EOI>  [<ffffffff814835af>] ? cpuidle_enter_state+0x4f/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff814836e5>] cpuidle_idle_call+0xc5/0x200
Nov 24 14:08:13 controller1 kernel: [<ffffffff8101bcae>] arch_cpu_idle+0xe/0x30
Nov 24 14:08:13 controller1 kernel: [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
Nov 24 14:08:13 controller1 kernel: [<ffffffff815d0271>] start_secondary+0x265/0x27b

and

Nov 24 14:08:13 controller1 kernel: qg-f7cc81ab-90: hw csum failure
Nov 24 14:08:13 controller1 kernel: CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.0-123.9.2.el7.x86_64 #1
Nov 24 14:08:13 controller1 kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
Nov 24 14:08:13 controller1 kernel: ffff88104f68c000 20b9a7e010e08381 ffff88085fd23b10 ffffffff815e241b
Nov 24 14:08:13 controller1 kernel: ffff88085fd23b28 ffffffff814cd56a ffff8807908b6f00 ffff88085fd23b48
Nov 24 14:08:13 controller1 kernel: ffffffff814c62d2 0000000010e08381 ffff8807908b6f00 ffff88085fd23b58
Nov 24 14:08:13 controller1 kernel: Call Trace:
Nov 24 14:08:13 controller1 kernel: <IRQ>  [<ffffffff815e241b>] dump_stack+0x19/0x1b
Nov 24 14:08:13 controller1 kernel: [<ffffffff814cd56a>] netdev_rx_csum_fault+0x3a/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff814c62d2>] __skb_checksum_complete_head+0x62/0x70
Nov 24 14:08:13 controller1 kernel: [<ffffffff814c62f1>] __skb_checksum_complete+0x11/0x20
Nov 24 14:08:13 controller1 kernel: [<ffffffff8155a2ac>] nf_ip_checksum+0xcc/0x100
Nov 24 14:08:13 controller1 kernel: [<ffffffffa0351b43>] udp_error+0x103/0x200 [nf_conntrack]
Nov 24 14:08:13 controller1 kernel: [<ffffffffa03c6ffd>] ? ovs_vport_send+0x1d/0x80 [openvswitch]
Nov 24 14:08:13 controller1 kernel: [<ffffffffa034b330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffffa0368302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
Nov 24 14:08:13 controller1 kernel: [<ffffffff8150066a>] nf_iterate+0xaa/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff81500704>] nf_hook_slow+0x84/0x140
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509420>] ? inet_del_offload+0x40/0x40
Nov 24 14:08:13 controller1 kernel: [<ffffffff81509e74>] ip_rcv+0x344/0x380
Nov 24 14:08:13 controller1 kernel: [<ffffffff814cffa6>] __netif_receive_skb_core+0x676/0x870
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d01b8>] __netif_receive_skb+0x18/0x60
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d0d7e>] process_backlog+0xae/0x180
Nov 24 14:08:13 controller1 kernel: [<ffffffff814d060a>] net_rx_action+0x15a/0x250
Nov 24 14:08:13 controller1 kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
Nov 24 14:08:13 controller1 kernel: [<ffffffff815f445c>] call_softirq+0x1c/0x30
Nov 24 14:08:13 controller1 kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
Nov 24 14:08:13 controller1 kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
Nov 24 14:08:13 controller1 kernel: [<ffffffff815f4d58>] do_IRQ+0x58/0xf0
Nov 24 14:08:13 controller1 kernel: [<ffffffff815e9ead>] common_interrupt+0x6d/0x6d
Nov 24 14:08:13 controller1 kernel: <EOI>  [<ffffffff814835af>] ? cpuidle_enter_state+0x4f/0xc0
Nov 24 14:08:13 controller1 kernel: [<ffffffff814836e5>] cpuidle_idle_call+0xc5/0x200
Nov 24 14:08:13 controller1 kernel: [<ffffffff8101bcae>] arch_cpu_idle+0xe/0x30
Nov 24 14:08:13 controller1 kernel: [<ffffffff810b47e5>] cpu_startup_entry+0xf5/0x290
Nov 24 14:08:13 controller1 kernel: [<ffffffff815d0271>] start_secondary+0x265/0x27b

Environment

  • Red Hat Enterprise Linux Openstack Platform 5.0 on RHEL 7.0
  • Cisco UCS B200M3 blades

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In