RHEL7 - virtio_net + docker bridges cause skb_warn_bad_offload warnings/errors

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux Server release 7.4
  • 3.10.0-693.11.1.el7.x86_64
  • Hypervisor : Nutanix
  • Openshift Container Platform 3.6

Issue

  • skb_warn_bad_offload warning message during starting openshift pod(container)
kernel: ------------[ cut here ]------------
 kernel: WARNING: CPU: 3 PID: 2696 at net/core/dev.c:2496 skb_warn_bad_offload+0xcd/0xda
kernel: : caps=(0x0000362007db58e9, 0x0000000000000000) len=2342 data_len=2214 gso_size=1398 gso_type=5 ip_summed=1
kernel: Modules linked in: nfsv3 twofish_generic twofish_x86_64 twofish_common serpent_avx_x86_64 serpent_sse2_x86_64 xts serpent_generic xt_statistic rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache veth nf_conntrack_netlink nfnetlink xt_nat xt_recent xt_mark xt_comment br_netfilter bridge stp llc vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul glue_helper ablk_helper cryptd sg joydev parport_pc virtio_balloon i2c_piix4 parport pcspkr nfsd auth_rpcgss
kernel:  nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_net virtio_scsi bochs_drm drm_kms_helper syscopyarea sysfillrect ata_piix sysimgblt fb_sys_fops ttm drm libata crct10dif_pclmul crct10dif_common i2c_core crc32c_intel virtio_pci floppy serio_raw virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
kernel: CPU: 3 PID: 2696 Comm: openshift Tainted: G        W      ------------   3.10.0-693.11.1.el7.x86_64 #1
kernel: Hardware name: Red Hat KVM, BIOS seabios-1.7.5-8.el6 04/01/2014
kernel:  ffff88023fd839a0 000000008499f94c ffff88023fd83950 ffffffff816a3e61
kernel:  ffff88023fd83990 ffffffff810879d8 000009c01c6f14b2 ffff88009cf6f500
kernel:  ffff8800bae58000 0000000000000005 0000000000000001 0000000000000006
kernel: Call Trace:
kernel:  <IRQ>  [<ffffffff816a3e61>] dump_stack+0x19/0x1b
kernel:  [<ffffffff810879d8>] __warn+0xd8/0x100
kernel:  [<ffffffff81087a5f>] warn_slowpath_fmt+0x5f/0x80
kernel:  [<ffffffff81329183>] ? ___ratelimit+0x93/0x100
kernel:  [<ffffffff816a60d4>] skb_warn_bad_offload+0xcd/0xda
kernel:  [<ffffffff81588c25>] __skb_gso_segment+0x105/0x150
kernel:  [<ffffffff81589025>] validate_xmit_skb.isra.102.part.103+0x135/0x2e0
kernel:  [<ffffffff815897f0>] __dev_queue_xmit+0x4b0/0x550
kernel:  [<ffffffff815898a0>] dev_queue_xmit+0x10/0x20
kernel:  [<ffffffff815cfdae>] ip_finish_output+0x52e/0x780
kernel:  [<ffffffff815d0303>] ip_output+0x73/0xe0
kernel:  [<ffffffff815cf880>] ? __ip_append_data.isra.50+0xa50/0xa50
kernel:  [<ffffffff815cbd16>] ip_forward_finish+0x66/0x80
kernel:  [<ffffffff815cc0ac>] ip_forward+0x37c/0x480
kernel:  [<ffffffff815cbcb0>] ? ip_frag_mem+0x40/0x40
kernel:  [<ffffffff815c9cfa>] ip_rcv_finish+0x8a/0x350
kernel:  [<ffffffff815ca686>] ip_rcv+0x2b6/0x410
kernel:  [<ffffffff815c9c70>] ? inet_del_offload+0x40/0x40
kernel:  [<ffffffff81586f22>] __netif_receive_skb_core+0x572/0x7c0
kernel:  [<ffffffff81587188>] __netif_receive_skb+0x18/0x60
kernel:  [<ffffffff81587210>] netif_receive_skb_internal+0x40/0xc0
kernel:  [<ffffffff81588318>] napi_gro_receive+0xd8/0x130
kernel:  [<ffffffffc005e505>] virtnet_poll+0x265/0x750 [virtio_net]
kernel:  [<ffffffff8158799d>] net_rx_action+0x16d/0x380
kernel:  [<ffffffff81090b4f>] __do_softirq+0xef/0x280
kernel:  [<ffffffff816b6b1c>] call_softirq+0x1c/0x30
kernel:  [<ffffffff8102d3c5>] do_softirq+0x65/0xa0
kernel:  [<ffffffff81090ed5>] irq_exit+0x105/0x110
kernel:  [<ffffffff816b76b6>] do_IRQ+0x56/0xe0
kernel:  [<ffffffff816ac2ad>] common_interrupt+0x6d/0x6d
kernel:  <EOI>
kernel: ---[ end trace bc1fd42939b400c6 ]---
kernel: ------------[ cut here ]------------
  • Trying to disable(ethtool -K) offloads in VM didn't appear to help.

Resolution

Update to RHEL 7.6 - kernel-3.10.0-957 via RHSA-2018:3083 or later.

Workaround

There is no workaround for this issue, however the message is harmless and can be ignored, no network traffic is lost.

Root Cause

Upstream commit which resolves this issue: https://patchwork.kernel.org/patch/9882493/

commit b2504a5dbef3305ef41988ad270b0e8ec289331c upstream.

Dmitry reported warnings occurring in __skb_gso_segment() [1]

All SKB_GSO_DODGY producers can allow user space to feed
packets that trigger the current check.

We could prevent them from doing so, rejecting packets, but
this might add regressions to existing programs.

It turns out our SKB_GSO_DODGY handlers properly set up checksum
information that is needed anyway when packets needs to be segmented.

By checking again skb_needs_check() after skb_mac_gso_segment(),
we should remove these pesky warnings, at a very minor cost.

This issue is addressed by Red Hat under Red Hat Private Bug 1544920 - virtio_net + docker bridges cause skb_warn_bad_offload warnings/errors.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

3 Comments

Hello, we encounter the same issue. We run an K8s(flannel) cluster on the KVM (bridge) environment. We get the same error message on the /var/log/message. When the ingress packet size is bigger than MTU, there will be print the "skb_warn_bad_offload" . And the application will never receive the packet, it will be lost.
I will be disable all the nic offload, but it seemd not work.

There are two ways that skb_warn_bad_offload() can be called, one results in packet loss and the other one (from __skb_gso_segment() above) does not result in packet loss. It's possible you are hitting the other path or some other problem.

It's a bit complex to troubleshoot such a thing from comments on a knowledgebase solution, please do open a support case if you'd like to investigate further.

Hello, Jamie Bainbridge Here is the call trace on my environment. I think this is same with the issue.

Jul 21 00:30:09 10-10-99-144 kernel: ------------[ cut here ]------------ Jul 21 00:30:09 10-10-99-144 kernel: WARNING: CPU: 1 PID: 0 at net/core/dev.c:2496 skb_warn_bad_offload+0xcd/0xda Jul 21 00:30:09 10-10-99-144 kernel: : caps=(0x00003021001b5889, 0x0000000000000000) len=1464 data_len=1336 gso_size=1398 gso_type=5 ip_summed=1 Jul 21 00:30:09 10-10-99-144 kernel: Modules linked in: nf_log_ipv4 nf_log_common xt_TRACE xt_LOG iptable_raw xt_statistic binfmt_misc nfnetlink_queue nfnetlink_log bluetooth loop cfg80211 rfkill nf_conntrack_netlink veth vxlan ip6_udp_tunnel udp_tunnel xt_nat xt_recent ipt_REJECT nf_reject_ipv4 ip_set nfnetlink xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay(T) sunrpc joydev sg ppdev virtio_balloon parport_pc i2c_piix4 parport pcspkr ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_net virtio_console virtio_scsi virtio_blk cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata virtio_pci virtio_ring serio_raw Jul 21 00:30:09 10-10-99-144 kernel: floppy i2c_core virtio dm_mirror dm_region_hash dm_log dm_mod Jul 21 00:30:09 10-10-99-144 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W ------------ T 3.10.0-693.el7.x86_64 #1 Jul 21 00:30:09 10-10-99-144 kernel: Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.el7 04/01/2014 Jul 21 00:30:09 10-10-99-144 kernel: ffff88043fc439a0 a3eaaf6a30ee44fa ffff88043fc43950 ffffffff816a3d91 Jul 21 00:30:09 10-10-99-144 kernel: ffff88043fc43990 ffffffff810879c8 000009c0a6fd2cb2 ffff8803e6346100 Jul 21 00:30:09 10-10-99-144 kernel: ffff88041e2fa000 0000000000000005 0000000000000001 0000000000000004 Jul 21 00:30:09 10-10-99-144 kernel: Call Trace: Jul 21 00:30:09 10-10-99-144 kernel: [] dump_stack+0x19/0x1b Jul 21 00:30:09 10-10-99-144 kernel: [] __warn+0xd8/0x100 Jul 21 00:30:09 10-10-99-144 kernel: [] warn_slowpath_fmt+0x5f/0x80 Jul 21 00:30:09 10-10-99-144 kernel: [] ? ___ratelimit+0x93/0x100 Jul 21 00:30:09 10-10-99-144 kernel: [] skb_warn_bad_offload+0xcd/0xda Jul 21 00:30:09 10-10-99-144 kernel: [] __skb_gso_segment+0x105/0x150 Jul 21 00:30:09 10-10-99-144 kernel: [] validate_xmit_skb.isra.102.part.103+0x135/0x2e0 Jul 21 00:30:09 10-10-99-144 kernel: [] __dev_queue_xmit+0x4b0/0x550 Jul 21 00:30:09 10-10-99-144 kernel: [] dev_queue_xmit+0x10/0x20 Jul 21 00:30:09 10-10-99-144 kernel: [] ip_finish_output+0x52e/0x780 Jul 21 00:30:09 10-10-99-144 kernel: [] ip_output+0x73/0xe0 Jul 21 00:30:09 10-10-99-144 kernel: [] ? __ip_append_data.isra.48+0xa00/0xa00 Jul 21 00:30:09 10-10-99-144 kernel: [] ip_forward_finish+0x66/0x80 Jul 21 00:30:09 10-10-99-144 kernel: [] ip_forward+0x37c/0x480 Jul 21 00:30:09 10-10-99-144 kernel: [] ? ip_frag_mem+0x40/0x40 Jul 21 00:30:09 10-10-99-144 kernel: [] ip_rcv_finish+0x8a/0x350 Jul 21 00:30:09 10-10-99-144 kernel: [] ip_rcv+0x2b6/0x410 Jul 21 00:30:09 10-10-99-144 kernel: [] ? inet_del_offload+0x40/0x40 Jul 21 00:30:09 10-10-99-144 kernel: [] __netif_receive_skb_core+0x572/0x7c0 Jul 21 00:30:09 10-10-99-144 kernel: [] ? __getnstimeofday64+0x3a/0xd0 Jul 21 00:30:09 10-10-99-144 kernel: [] __netif_receive_skb+0x18/0x60 Jul 21 00:30:09 10-10-99-144 kernel: [] netif_receive_skb_internal+0x40/0xc0 Jul 21 00:30:09 10-10-99-144 kernel: [] napi_gro_receive+0xd8/0x130 Jul 21 00:30:09 10-10-99-144 kernel: [] virtnet_poll+0x265/0x750 [virtio_net] Jul 21 00:30:09 10-10-99-144 kernel: [] net_rx_action+0x16d/0x380 Jul 21 00:30:09 10-10-99-144 kernel: [] __do_softirq+0xef/0x280 Jul 21 00:30:09 10-10-99-144 kernel: [] call_softirq+0x1c/0x30 Jul 21 00:30:09 10-10-99-144 kernel: [] do_softirq+0x65/0xa0 Jul 21 00:30:09 10-10-99-144 kernel: [] irq_exit+0x105/0x110 Jul 21 00:30:09 10-10-99-144 kernel: [] do_IRQ+0x56/0xe0 Jul 21 00:30:09 10-10-99-144 kernel: [] common_interrupt+0x6d/0x6d Jul 21 00:30:09 10-10-99-144 kernel: [] ? native_safe_halt+0x6/0x10 Jul 21 00:30:09 10-10-99-144 kernel: [] default_idle+0x1e/0xc0 Jul 21 00:30:09 10-10-99-144 kernel: [] arch_cpu_idle+0x26/0x30 Jul 21 00:30:09 10-10-99-144 kernel: [] cpu_startup_entry+0x14a/0x1c0 Jul 21 00:30:09 10-10-99-144 kernel: [] start_secondary+0x1b6/0x230 Jul 21 00:30:09 10-10-99-144 kernel: ---[ end trace 1d38ab13aa011722 ]---