RHEL8: Large number of refcount_t overflow messages that are sometimes followed by a crash
Issue
- Large number of refcount_t overflow messages that are sometimes followed by crash
[971170.499671] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in su[3087833], uid/euid: 0/0
[975665.799431] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in IndexerTPoolWor[3781778], uid/euid: 30872/30872
[985670.343013] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[988278.646874] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[991565.632746] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[993986.287413] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in StreamSearch[1253844], uid/euid: 30872/30872
[1007486.127435] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1016494.384002] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in systemd[1], uid/euid: 0/0
[1041958.213026] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1061178.942438] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1078277.567973] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1097773.613079] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1098081.078401] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in systemd[1], uid/euid: 0/0
[1104678.755488] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1121170.122385] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1128375.667499] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1140071.991238] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in polkitd[1617], uid/euid: 999/999
[1142511.714927] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1142511.714921] ------------[ cut here ]------------
[1142511.714927] refcount_t overflow at mem_cgroup_id_get_online+0x7a/0xa0 in kswapd0[239], uid/euid: 0/0
[1142511.714935] WARNING: CPU: 4 PID: 239 at kernel/panic.c:703 refcount_error_report+0x98/0x9d
[1142511.714940] Modules linked in: [...]
[1142511.714977] Red Hat flags: eBPF/rawtrace
[1142511.714979] CPU: 4 PID: 239 Comm: kswapd0 Kdump: loaded Tainted: G W L --------- - - 4.18.0-425.10.1.el8_7.x86_64 #1
[1142511.714982] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[1142511.714983] RIP: 0010:refcount_error_report+0x98/0x9d
[1142511.714986] Code: 8b 84 24 00 09 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 e0 0a 00 00 41 55 41 89 c1 48 89 de 48 c7 c7 78 e9 ad 8b e8 05 00 00 00 <0f> 0b 58 eb 8b 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 c7 c7 d8
[1142511.714988] RSP: 0018:ffffb0a5c6f538f0 EFLAGS: 00010086
[1142511.714990] RAX: 0000000000000000 RBX: ffffffff8baef958 RCX: 0000000000000027
[1142511.714992] RDX: 0000000000000027 RSI: 00000000ffff7fff RDI: ffff9cf67df16690
[1142511.714993] RBP: ffffb0a5c6f539a8 R08: 0000000000000000 R09: c0000000ffff7fff
[1142511.714995] R10: 0000000000000001 R11: ffffb0a5c6f53708 R12: ffff9ce7b9be8000
[1142511.714996] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb0a5c6f539a8
[1142511.714997] FS: 0000000000000000(0000) GS:ffff9cf67df00000(0000) knlGS:0000000000000000
[1142511.714999] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1142511.715000] CR2: 00005564d5cce400 CR3: 0000000919e10004 CR4: 00000000007706e0
[1142511.715042] PKRU: 55555554
[1142511.715044] Call Trace:
[1142511.715048] ex_handler_refcount+0x4e/0x80
[1142511.715053] fixup_exception+0x33/0x46
[1142511.715055] do_trap+0x4c/0x110
[1142511.715060] ? __noinstr_text_end+0x8c3/0x2bd9
[1142511.715065] do_invalid_op+0x36/0x40
[1142511.715066] ? __noinstr_text_end+0x8c3/0x2bd9
[1142511.715068] invalid_op+0x14/0x20
[1142511.715071] RIP: 0010:mem_cgroup_id_get_online+0x7a/0xa0
[1142511.715073] Code: 48 0f 44 f8 eb af 85 c0 74 d7 89 c2 8d 48 01 c1 e8 1f 81 fa ff ff ff 7f 41 0f 94 c0 41 08 c0 75 04 39 d1 7d b0 e9 db 7e 6a 00 <48> 89 f8 e9 2e 95 8c 00 0f 0b 48 89 f8 e9 24 95 8c 00 48 89 c7 e9
[1142511.715075] RSP: 0018:ffffb0a5c6f53a58 EFLAGS: 00010812
[1142511.715077] RAX: ffff9ce6d5846000 RBX: ffffee3ff7883840 RCX: ffff9ce6d5846134
[1142511.715078] RDX: 00000000c0000000 RSI: ffff9ce6d5846134 RDI: ffff9ce6d5846000
[1142511.715079] RBP: 00000000003895cc R08: 0000000000000246 R09: 0000000000000011
[1142511.715080] R10: ffff9ce78a64b7a0 R11: 0000000000000000 R12: ffff9ce6d5846000
[1142511.715081] R13: 0000000000000000 R14: ffff9ce78a64b7a8 R15: 00000000003895cc
[1142511.715084] mem_cgroup_swapout+0x4f/0x170
[1142511.715088] __remove_mapping+0x158/0x220
[1142511.715092] shrink_page_list+0x91c/0xca0
[1142511.715095] shrink_inactive_list+0x19e/0x3e0
[1142511.715097] shrink_lruvec+0x474/0x6c0
[1142511.715100] shrink_node+0x22e/0x700
[1142511.715102] balance_pgdat+0x2d7/0x550
[1142511.715104] kswapd+0x201/0x3c0
[1142511.715106] ? finish_wait+0x80/0x80
[1142511.715109] ? balance_pgdat+0x550/0x550
[1142511.715111] kthread+0x10b/0x130
[1142511.715115] ? set_kthread_struct+0x50/0x50
[1142511.715117] ret_from_fork+0x1f/0x40
[1142511.715123] ---[ end trace deb8deeac20add33 ]---
- GPF that happens in memcg_flush_lruvec_page_state() follows the overflow messages sometimes:
[1161523.637331] general protection fault: 0000 [#1] SMP NOPTI
[1161523.637339] CPU: 11 PID: 1830388 Comm: kworker/11:0 Kdump: loaded Tainted: G W L --------- - - 4.18.0-425.10.1.el8_7.x86_64 #1
[1161523.637342] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[1161523.637344] Workqueue: events percpu_stats_free_rwork_fn
[1161523.637352] RIP: 0010:memcg_flush_lruvec_page_state.part.55+0x7d/0x140
[1161523.637356] Code: 00 48 c7 c5 40 c8 ba 8b 4c 8d ac 24 38 01 00 00 4c 63 c7 48 89 e2 4b 8b b4 c4 98 10 00 00 48 8b 86 90 00 00 00 48 03 44 dd 00 <48> 63 08 48 83 c2 08 c7 00 00 00 00 00 48 83 c0 04 48 89 4a f8 4c
[1161523.637358] RSP: 0018:ffffb0a5e82bbd08 EFLAGS: 00010087
[1161523.637360] RAX: ffff39e992ee6488 RBX: 0000000000000000 RCX: 0000000000000000
[1161523.637362] RDX: ffffb0a5e82bbd08 RSI: ffff9cf3150e6400 RDI: 0000000000000000
[1161523.637363] RBP: ffffffff8bbac840 R08: 0000000000000000 R09: 0000000000000000
[1161523.637364] R10: 8080808080808080 R11: 0000000000000001 R12: ffff9ce688e4c000
[1161523.637366] R13: ffffb0a5e82bbe40 R14: ffff9ce812616000 R15: ffff9ce688e4c588
[1161523.637367] FS: 0000000000000000(0000) GS:ffff9cf67e0c0000(0000) knlGS:0000000000000000
[1161523.637369] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1161523.637370] CR2: 000000c0001e1000 CR3: 0000000919e10003 CR4: 00000000007706e0
[1161523.637399] PKRU: 55555554
[1161523.637400] Call Trace:
[1161523.637408] ? update_load_avg+0x7e/0x710
[1161523.637414] ? update_load_avg+0x7e/0x710
[1161523.637416] ? set_next_entity+0xb5/0x1e0
[1161523.637418] ? cpumask_next+0x17/0x20
[1161523.637423] ? cgroup_rstat_flush_locked+0x2f/0x280
[1161523.637427] percpu_stats_free_rwork_fn+0x6b/0x130
[1161523.637429] process_one_work+0x1a7/0x360
[1161523.637435] ? create_worker+0x1a0/0x1a0
[1161523.637436] worker_thread+0x30/0x390
[1161523.637438] ? create_worker+0x1a0/0x1a0
[1161523.637440] kthread+0x10b/0x130
[1161523.637444] ? set_kthread_struct+0x50/0x50
[1161523.637447] ret_from_fork+0x1f/0x40
Environment
- Red Hat Enterprise Linux 8.7 GA kernel-4.18.0-425.3.1.el8 and onwards
- Red Hat Enterprise Linux 8.6.z kernel-4.18.0-372.26.1.el8_6 and onwards
- Red Hat Enterprise Linux 8.4.z kernel-4.18.0-305.62.1.el8_4 and onwards
- cgroups
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.