RHEL instance OOM when hotplug secondary NIC on Alibaba Cloud

Solution In Progress - Updated -

Issue

  • When RHEL7.7 with huge CPU and memory instances running, hotplug the secondary NIC on alibaba cloud will cause Out-Of-Memory panic with below error

    [  125.114234] pci 0000:00:07.0: [1af4:1000] type 00 class 0x020000
    [  125.114310] pci 0000:00:07.0: reg 0x10: [mem 0x00000000-0x00000fff 64bit pref]
    [  125.114357] pci 0000:00:07.0: reg 0x18: [mem 0x00000000-0x00000fff 64bit pref]
    [  125.115037] pci 0000:00:07.0: BAR 0: assigned [mem 0xc0000000-0xc0000fff 64bit pref]
    [  125.115735] pci 0000:00:07.0: BAR 2: assigned [mem 0xc0001000-0xc0001fff 64bit pref]
    [  125.116415] pci 0000:00:1f.0: PCI bridge to [bus 01]
    [  125.116833] pci 0000:00:1f.0:   bridge window [io  0xc000-0xcfff]
    [  125.122859] pci 0000:00:1f.0:   bridge window [mem 0xfe800000-0xfe9fffff]
    [  125.127015] pci 0000:00:1f.0:   bridge window [mem 0xfe000000-0xfe1fffff 64bit pref]
    [  125.134848] virtio-pci 0000:00:07.0: enabling device (0000 -> 0002)
    [  125.139508] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver
    [  125.144592] virtio-pci 0000:00:07.0: irq 98 for MSI/MSI-X
    [  125.144607] virtio-pci 0000:00:07.0: irq 99 for MSI/MSI-X
    [  125.144621] virtio-pci 0000:00:07.0: irq 100 for MSI/MSI-X
    [  125.144635] virtio-pci 0000:00:07.0: irq 101 for MSI/MSI-X
    ......
    [  125.145406] virtio-pci 0000:00:07.0: irq 157 for MSI/MSI-X
    [  125.145419] virtio-pci 0000:00:07.0: irq 158 for MSI/MSI-X
    [  125.145433] virtio-pci 0000:00:07.0: irq 159 for MSI/MSI-X
    [  125.145446] virtio-pci 0000:00:07.0: irq 160 for MSI/MSI-X
    [  125.145461] virtio-pci 0000:00:07.0: irq 161 for MSI/MSI-X
    [  125.145474] virtio-pci 0000:00:07.0: irq 162 for MSI/MSI-X
    [  125.260426] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
    [  128.893942] kworker/u104:0 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
    [  128.894635] kworker/u104:0 cpuset=/ mems_allowed=0
    [  128.895032] CPU: 14 PID: 5 Comm: kworker/u104:0 Kdump: loaded Not tainted 3.10.0-1062.el7.x86_64 #1
    [  128.895763] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS XXXX <mon>/<date/<year>
    [  128.896383] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    [  128.896836] Call Trace:
    [  128.897045]  [<ffffffffa1579262>] dump_stack+0x19/0x1b
    [  128.897466]  [<ffffffffa1573c04>] dump_header+0x90/0x229
    [  128.897901]  [<ffffffffa110825b>] ? cred_has_capability+0x6b/0x120
    [  128.898403]  [<ffffffffa0fbfd74>] oom_kill_process+0x254/0x3e0
    [  128.898882]  [<ffffffffa110833e>] ? selinux_capable+0x2e/0x40
    [  128.899365]  [<ffffffffa0fc05c6>] out_of_memory+0x4b6/0x4f0
    [  128.899818]  [<ffffffffa157471c>] __alloc_pages_slowpath+0x5d6/0x724
    [  128.900333]  [<ffffffffa0fc6b84>] __alloc_pages_nodemask+0x404/0x420
    [  128.900859]  [<ffffffffa1014c68>] alloc_pages_current+0x98/0x110
    [  128.901369]  [<ffffffffc04e6f05>] try_fill_recv+0x4e5/0x570 [virtio_net]
    [  128.901914]  [<ffffffffc04e882b>] virtnet_probe+0x6db/0x868 [virtio_net]
    [  128.902459]  [<ffffffffc042792f>] virtio_dev_probe+0x1cf/0x2d0 [virtio]
    [  128.902995]  [<ffffffffa12b4205>] driver_probe_device+0xc5/0x3e0
    [  128.903482]  [<ffffffffa12b4520>] ? driver_probe_device+0x3e0/0x3e0
    [  128.903990]  [<ffffffffa12b4563>] __device_attach+0x43/0x50
    [  128.904445]  [<ffffffffa12b1e85>] bus_for_each_drv+0x75/0xc0
    [  128.904904]  [<ffffffffa12b4040>] device_attach+0x90/0xb0
    [  128.905344]  [<ffffffffa12b3268>] bus_probe_device+0x98/0xd0
    [  128.905803]  [<ffffffffa12b0b0f>] device_add+0x4ef/0x7b0
    [  128.906234]  [<ffffffffc0583520>] ? vp_finalize_features+0x40/0x40 [virtio_pci]
    [  128.906827]  [<ffffffffc0583520>] ? vp_finalize_features+0x40/0x40 [virtio_pci]
    [  128.907421]  [<ffffffffa12b0dea>] device_register+0x1a/0x20
    [  128.907875]  [<ffffffffc04273c9>] register_virtio_device+0xb9/0x100 [virtio]
    [  128.908445]  [<ffffffffc0582a67>] virtio_pci_probe+0xb7/0x140 [virtio_pci]
    [  128.909018]  [<ffffffffa11cf97a>] local_pci_probe+0x4a/0xb0
    [  128.909483]  [<ffffffffa11d10c9>] pci_device_probe+0x109/0x160
    [  128.909958]  [<ffffffffa12b4205>] driver_probe_device+0xc5/0x3e0
    [  128.910445]  [<ffffffffa12b4520>] ? driver_probe_device+0x3e0/0x3e0
    [  128.910951]  [<ffffffffa12b4563>] __device_attach+0x43/0x50
    [  128.911719]  [<ffffffffa12b1e85>] bus_for_each_drv+0x75/0xc0
    [  128.912471]  [<ffffffffa12b4040>] device_attach+0x90/0xb0
    [  128.913202]  [<ffffffffa11c461f>] pci_bus_add_device+0x4f/0xa0
    [  128.913965]  [<ffffffffa11c46a9>] pci_bus_add_devices+0x39/0x80
    [  128.914725]  [<ffffffffa11ed719>] enable_slot+0x239/0x4a0
    [  128.915428]  [<ffffffffa11ec938>] ? get_slot_status+0xa8/0x110
    [  128.916163]  [<ffffffffa11eda87>] acpiphp_check_bridge.part.9+0x107/0x140
    [  128.916980]  [<ffffffffa11edd07>] acpiphp_hotplug_notify+0x177/0x210
    [  128.917761]  [<ffffffffa11edb90>] ? acpiphp_post_dock_fixup+0xd0/0xd0
    [  128.918546]  [<ffffffffa1213fce>] acpi_device_hotplug+0x3b7/0x41a
    [  128.919303]  [<ffffffffa120d097>] acpi_hotplug_work_fn+0x1e/0x29
    [  128.920058]  [<ffffffffa0ebd0ff>] process_one_work+0x17f/0x440
    [  128.920795]  [<ffffffffa0ebe216>] worker_thread+0x126/0x3c0
    [  128.921504]  [<ffffffffa0ebe0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
    [  128.922285]  [<ffffffffa0ec50d1>] kthread+0xd1/0xe0
    [  128.922937]  [<ffffffffa0ec5000>] ? insert_kthread_work+0x40/0x40
    [  128.923677]  [<ffffffffa158bd37>] ret_from_fork_nospec_begin+0x21/0x21
    [  128.924443]  [<ffffffffa0ec5000>] ? insert_kthread_work+0x40/0x40
    [  128.925175] Mem-Info:
    [  128.925610] active_anon:12213 inactive_anon:4216 isolated_anon:0
     active_file:0 inactive_file:44 isolated_file:0
     unevictable:0 dirty:5 writeback:0 unstable:0
     slab_reclaimable:6307 slab_unreclaimable:18914
     mapped:894 shmem:4262 pagetables:934 bounce:0
     free:111695 free_pcp:330 free_cma:0
    [  128.929625] Node 0 DMA free:15908kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
    [  128.935532] lowmem_reserve[]: 0 2808 93453 93453
    

    ~~~

Environment

  • Red Hat Enterprise Linux 7.x instances
  • ecs.c6.8xlarge or larger instance type on Alibaba Cloud
  • NetworkManager service

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content