RHEL instance OOM when hotplug secondary NIC on Alibaba Cloud
Issue
-
When RHEL7.7 with huge CPU and memory instances running, hotplug the secondary NIC on alibaba cloud will cause Out-Of-Memory panic with below error
[ 125.114234] pci 0000:00:07.0: [1af4:1000] type 00 class 0x020000 [ 125.114310] pci 0000:00:07.0: reg 0x10: [mem 0x00000000-0x00000fff 64bit pref] [ 125.114357] pci 0000:00:07.0: reg 0x18: [mem 0x00000000-0x00000fff 64bit pref] [ 125.115037] pci 0000:00:07.0: BAR 0: assigned [mem 0xc0000000-0xc0000fff 64bit pref] [ 125.115735] pci 0000:00:07.0: BAR 2: assigned [mem 0xc0001000-0xc0001fff 64bit pref] [ 125.116415] pci 0000:00:1f.0: PCI bridge to [bus 01] [ 125.116833] pci 0000:00:1f.0: bridge window [io 0xc000-0xcfff] [ 125.122859] pci 0000:00:1f.0: bridge window [mem 0xfe800000-0xfe9fffff] [ 125.127015] pci 0000:00:1f.0: bridge window [mem 0xfe000000-0xfe1fffff 64bit pref] [ 125.134848] virtio-pci 0000:00:07.0: enabling device (0000 -> 0002) [ 125.139508] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver [ 125.144592] virtio-pci 0000:00:07.0: irq 98 for MSI/MSI-X [ 125.144607] virtio-pci 0000:00:07.0: irq 99 for MSI/MSI-X [ 125.144621] virtio-pci 0000:00:07.0: irq 100 for MSI/MSI-X [ 125.144635] virtio-pci 0000:00:07.0: irq 101 for MSI/MSI-X ...... [ 125.145406] virtio-pci 0000:00:07.0: irq 157 for MSI/MSI-X [ 125.145419] virtio-pci 0000:00:07.0: irq 158 for MSI/MSI-X [ 125.145433] virtio-pci 0000:00:07.0: irq 159 for MSI/MSI-X [ 125.145446] virtio-pci 0000:00:07.0: irq 160 for MSI/MSI-X [ 125.145461] virtio-pci 0000:00:07.0: irq 161 for MSI/MSI-X [ 125.145474] virtio-pci 0000:00:07.0: irq 162 for MSI/MSI-X [ 125.260426] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready [ 128.893942] kworker/u104:0 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [ 128.894635] kworker/u104:0 cpuset=/ mems_allowed=0 [ 128.895032] CPU: 14 PID: 5 Comm: kworker/u104:0 Kdump: loaded Not tainted 3.10.0-1062.el7.x86_64 #1 [ 128.895763] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS XXXX <mon>/<date/<year> [ 128.896383] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 128.896836] Call Trace: [ 128.897045] [<ffffffffa1579262>] dump_stack+0x19/0x1b [ 128.897466] [<ffffffffa1573c04>] dump_header+0x90/0x229 [ 128.897901] [<ffffffffa110825b>] ? cred_has_capability+0x6b/0x120 [ 128.898403] [<ffffffffa0fbfd74>] oom_kill_process+0x254/0x3e0 [ 128.898882] [<ffffffffa110833e>] ? selinux_capable+0x2e/0x40 [ 128.899365] [<ffffffffa0fc05c6>] out_of_memory+0x4b6/0x4f0 [ 128.899818] [<ffffffffa157471c>] __alloc_pages_slowpath+0x5d6/0x724 [ 128.900333] [<ffffffffa0fc6b84>] __alloc_pages_nodemask+0x404/0x420 [ 128.900859] [<ffffffffa1014c68>] alloc_pages_current+0x98/0x110 [ 128.901369] [<ffffffffc04e6f05>] try_fill_recv+0x4e5/0x570 [virtio_net] [ 128.901914] [<ffffffffc04e882b>] virtnet_probe+0x6db/0x868 [virtio_net] [ 128.902459] [<ffffffffc042792f>] virtio_dev_probe+0x1cf/0x2d0 [virtio] [ 128.902995] [<ffffffffa12b4205>] driver_probe_device+0xc5/0x3e0 [ 128.903482] [<ffffffffa12b4520>] ? driver_probe_device+0x3e0/0x3e0 [ 128.903990] [<ffffffffa12b4563>] __device_attach+0x43/0x50 [ 128.904445] [<ffffffffa12b1e85>] bus_for_each_drv+0x75/0xc0 [ 128.904904] [<ffffffffa12b4040>] device_attach+0x90/0xb0 [ 128.905344] [<ffffffffa12b3268>] bus_probe_device+0x98/0xd0 [ 128.905803] [<ffffffffa12b0b0f>] device_add+0x4ef/0x7b0 [ 128.906234] [<ffffffffc0583520>] ? vp_finalize_features+0x40/0x40 [virtio_pci] [ 128.906827] [<ffffffffc0583520>] ? vp_finalize_features+0x40/0x40 [virtio_pci] [ 128.907421] [<ffffffffa12b0dea>] device_register+0x1a/0x20 [ 128.907875] [<ffffffffc04273c9>] register_virtio_device+0xb9/0x100 [virtio] [ 128.908445] [<ffffffffc0582a67>] virtio_pci_probe+0xb7/0x140 [virtio_pci] [ 128.909018] [<ffffffffa11cf97a>] local_pci_probe+0x4a/0xb0 [ 128.909483] [<ffffffffa11d10c9>] pci_device_probe+0x109/0x160 [ 128.909958] [<ffffffffa12b4205>] driver_probe_device+0xc5/0x3e0 [ 128.910445] [<ffffffffa12b4520>] ? driver_probe_device+0x3e0/0x3e0 [ 128.910951] [<ffffffffa12b4563>] __device_attach+0x43/0x50 [ 128.911719] [<ffffffffa12b1e85>] bus_for_each_drv+0x75/0xc0 [ 128.912471] [<ffffffffa12b4040>] device_attach+0x90/0xb0 [ 128.913202] [<ffffffffa11c461f>] pci_bus_add_device+0x4f/0xa0 [ 128.913965] [<ffffffffa11c46a9>] pci_bus_add_devices+0x39/0x80 [ 128.914725] [<ffffffffa11ed719>] enable_slot+0x239/0x4a0 [ 128.915428] [<ffffffffa11ec938>] ? get_slot_status+0xa8/0x110 [ 128.916163] [<ffffffffa11eda87>] acpiphp_check_bridge.part.9+0x107/0x140 [ 128.916980] [<ffffffffa11edd07>] acpiphp_hotplug_notify+0x177/0x210 [ 128.917761] [<ffffffffa11edb90>] ? acpiphp_post_dock_fixup+0xd0/0xd0 [ 128.918546] [<ffffffffa1213fce>] acpi_device_hotplug+0x3b7/0x41a [ 128.919303] [<ffffffffa120d097>] acpi_hotplug_work_fn+0x1e/0x29 [ 128.920058] [<ffffffffa0ebd0ff>] process_one_work+0x17f/0x440 [ 128.920795] [<ffffffffa0ebe216>] worker_thread+0x126/0x3c0 [ 128.921504] [<ffffffffa0ebe0f0>] ? manage_workers.isra.26+0x2a0/0x2a0 [ 128.922285] [<ffffffffa0ec50d1>] kthread+0xd1/0xe0 [ 128.922937] [<ffffffffa0ec5000>] ? insert_kthread_work+0x40/0x40 [ 128.923677] [<ffffffffa158bd37>] ret_from_fork_nospec_begin+0x21/0x21 [ 128.924443] [<ffffffffa0ec5000>] ? insert_kthread_work+0x40/0x40 [ 128.925175] Mem-Info: [ 128.925610] active_anon:12213 inactive_anon:4216 isolated_anon:0 active_file:0 inactive_file:44 isolated_file:0 unevictable:0 dirty:5 writeback:0 unstable:0 slab_reclaimable:6307 slab_unreclaimable:18914 mapped:894 shmem:4262 pagetables:934 bounce:0 free:111695 free_pcp:330 free_cma:0 [ 128.929625] Node 0 DMA free:15908kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 128.935532] lowmem_reserve[]: 0 2808 93453 93453
~~~
Environment
- Red Hat Enterprise Linux 7.x instances
- ecs.c6.8xlarge or larger instance type on Alibaba Cloud
- NetworkManager service
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.