The kernel hangs up where a task is calling throttle_direct_reclaim() and looping indefinitely
Issue
- A large number of blocked tasks appear:
INFO: task python3:1670971 blocked for more than 120 seconds.
Tainted: G OE -------- - - 4.18.0-553.el8_10.aarch64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:python3 state:D stack:0 pid:1670971 ppid:1667117 flags:0x00800080
Call trace:
__switch_to+0xd0/0x120
__schedule+0x340/0xac8
schedule+0x68/0x118
schedule_preempt_disabled+0x18/0x20
rwsem_down_read_slowpath+0x2d4/0x4b8
down_read+0x50/0xc0
do_page_fault+0x120/0x3f0
do_translation_fault+0xa0/0xb0
do_mem_abort+0x54/0xb0
el0_da+0x40/0x78
el0t_64_sync_handler+0x60/0xb0
el0t_64_sync+0x148/0x14c
- All of these blocked tasks have been in a long-term TASK_UNINTERRUPTIBLE state, waiting to acquire the mm->mmap_sem.
- The mmap_sem owner is calling throttle_direct_reclaim() and looping indefinitely, leading to the numerous blocked tasks and the severe system hangup.
PID: 1667291 TASK: ffff00901a361100 CPU: 12 COMMAND: "python3"
#0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac
#1 [ffff80002cb6f900] __schedule at ffff800008abbd1c
#2 [ffff80002cb6f990] schedule at ffff800008abc50c
#3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550
#4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68
#5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660
#6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98
#7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8
#8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974
#9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4
...
Environment
- Red Hat Enterprise Linux for ARM64 8.10 - 4.18.0-553.el8_10
- The issue was reported on RHEL for ARM64, but theoretically, this bug can also occur on other architectures, including x86_64
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.