RHEL7: deadlock between uart port lock and tasklist_lock

Solution Unverified - Updated -

Issue

On a RHEL7.6 system with UPS connected on serial port, we see HARD lock panics. This happened multiple times. In both dump the scenario is similar:

  • one cpu in 8250_rx path trying to get uart port lock
  • one cpu in 8250_tx path doing send_sigio wants tasklist_lock for read, holding port lock
  • one cpu in some exit path wanting tasklist_list for write

I've been digging and see 2 possible fixes. Either remove the unnecessary unlock/lock at the end of serial8250_rx_chars(), or a bit indirect one - newer send_sigio() code has optimization that avoids grabbing tasklist_lock in some cases (and it would avoid doing it in our case too). The first is present in RHEL8.0 and above, the second in RHEL8.1.

I'm checking if customer could try rhel8, but it's unlikely. I have a question outstanding about which rhel7, they
would need fix for as rhel7.6 is only EUS.

crash> bt
PID: 5666   TASK: ffff8c54b082a080  CPU: 1   COMMAND: "systemd-cgroups"
 #0 [ffff8c54bec889f0] machine_kexec at ffffffffa2a63674
 #1 [ffff8c54bec88a50] __crash_kexec at ffffffffa2b1ce12
 #2 [ffff8c54bec88b20] panic at ffffffffa315b4db
 #3 [ffff8c54bec88ba0] nmi_panic at ffffffffa2a9739f
 #4 [ffff8c54bec88bb0] watchdog_overflow_callback at ffffffffa2b49241
 #5 [ffff8c54bec88bc8] __perf_event_overflow at ffffffffa2ba1027
 #6 [ffff8c54bec88c00] perf_event_overflow at ffffffffa2baa694
 #7 [ffff8c54bec88c10] intel_pmu_handle_irq at ffffffffa2a0a6b0
 #8 [ffff8c54bec88e38] perf_event_nmi_handler at ffffffffa316b031
 #9 [ffff8c54bec88e58] nmi_handle at ffffffffa316c8fc
#10 [ffff8c54bec88eb0] do_nmi at ffffffffa316cbd8
#11 [ffff8c54bec88ef0] end_repeat_nmi at ffffffffa316bd69
    [exception RIP: native_queued_spin_lock_slowpath+290]
    RIP: ffffffffa2b12102  RSP: ffff8c5417a8be20  RFLAGS: 00000046
    RAX: 0000000000000000  RBX: ffffffffa3607080  RCX: 0000000000090000
    RDX: ffff8c54bed9b780  RSI: 0000000000190100  RDI: ffffffffa3607084
    RBP: ffff8c5417a8be20   R8: ffff8c54bec9b780   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffffffa3607084
    R13: ffff8c54197bd1b8  R14: 0000000000000000  R15: ffff8c54b082a080
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#12 [ffff8c5417a8be20] native_queued_spin_lock_slowpath at ffffffffa2b12102
#13 [ffff8c5417a8be28] queued_spin_lock_slowpath at ffffffffa315bf5a
#14 [ffff8c5417a8be38] queued_write_lock_slowpath at ffffffffa2b1236b
#15 [ffff8c5417a8be58] _raw_qwrite_lock at ffffffffa316a601
#16 [ffff8c5417a8be68] tasklist_write_lock_irq at ffffffffa2a93beb
#17 [ffff8c5417a8be78] do_exit at ffffffffa2a9dcb5
#18 [ffff8c5417a8bf10] do_group_exit at ffffffffa2a9e44f
#19 [ffff8c5417a8bf40] sys_exit_group at ffffffffa2a9e4c4
#20 [ffff8c5417a8bf50] system_call_fastpath at ffffffffa3174ddb
    RIP: 00007fbc78ed81d9  RSP: 00007ffd60f96808  RFLAGS: 00010206
    RAX: 00000000000000e7  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: 00007fbc791d5838   R8: 000000000000003c   R9: 00000000000000e7
    R10: ffffffffffffff60  R11: 0000000000000246  R12: 00007fbc791d5838
    R13: 00007fbc791dae80  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 00000000000000e7  CS: 0033  SS: 002b
crash> 
Kernel: 3.10.0-957.el7.x86_64

Environment

  • Red Hat Enterprise Linux (RHEL) 7
  • seen on kernel 3.10.0-957.el7.x86_64 (RHEL-7.6)
  • crash

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content