kernel:BUG: soft lockup - CPU stuck for 68s! in sctp_assoc_update_retran_path

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 6.6 or earlier
  • Stream Control Transmission Protocol (SCTP) association with multi-homed endpoints
  • SCTP transport failover between end points

Issue

  • kernel:BUG: soft lockup - CPU stuck for 68s! in sctp_assoc_update_retran_path
  • Soft lockup hang panic with backtrace similar to:

    RIP: 0010:[<ffffffffa031c64d>]  [<ffffffffa031c64d>] sctp_assoc_update_retran_path+0x6d/0xa0 [sctp]
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880fe5781000
    
    Call Trace:
     <IRQ> 
     [<ffffffffa03266ac>] ? sctp_retransmit+0x1dc/0x1f0 [sctp]
     [<ffffffffa031a423>] ? sctp_do_sm+0xab3/0x1210 [sctp]
     [<ffffffff81496d1d>] ? ip_rcv_finish+0x12d/0x440
     [<ffffffff814972a5>] ? ip_rcv+0x275/0x350
     [<ffffffffa031aeb0>] ? sctp_generate_t3_rtx_event+0x0/0xd0 [sctp]
     [<ffffffffa031af31>] ? sctp_generate_t3_rtx_event+0x81/0xd0 [sctp]
    

Resolution

  • Update to the RHEL 6.7 kernel package (kernel-2.6.32-573.el6) or later
  • If it is necessary to remain on the RHEL 6.6 kernel, update the kernel package to kernel-2.6.32-504.46.1.el6 or later. Please note that RHEL6.6 is no longer receiving updates as per the Extended Update Support section of the Red Hat Enterprise Linux Life Cycle page.

Root Cause

SCTP was retransmitting a packet on the RTO timer.

All transports were either in "unconfirmed" or "inactive" state.

The SCTP in RHEL 6.6 could get into a loop here, trying to pick which of the "inactive" transports was better, but neither is better and so resulted in a tie and hang.

This is resolved with upstream commit a7288c4 which applies relevant lines from the RFC to break the tie.

This was backported to RHEL 6 on Private Bug 1090561 and released in RHEL 6.7 kernel package kernel-2.6.32-573.el6 on Errata RHSA-2015:1272. This was also backported to RHEL6.6 on Private Bug 1306565 and released in kernel package kernel-2.6.32-504.46.1.el6 on Errata RHSA-2016:0617.

Diagnostic Steps

BUG: soft lockup - CPU#0 stuck for 67s! [swapper:0]
Pid: 0, comm: swapper Tainted: P           ---------------    2.6.32-504.30.3.el6.x86_64 #1 HP ProLiant BL460c Gen9
RIP: 0010:[<ffffffffa031c64d>]  [<ffffffffa031c64d>] sctp_assoc_update_retran_path+0x6d/0xa0 [sctp]
RSP: 0018:ffff880063203c00  EFLAGS: 00000286
RAX: ffff88106511b800 RBX: ffff880063203c00 RCX: ffff88106511b800
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880fe5781000
RBP: ffffffff8100bc13 R08: ffff880fe5781148 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88103feec800 R12: ffff880063203b80
R13: ffff880fe57816e0 R14: ffff880063203b70 R15: ffffffff81533d55
FS:  0000000000000000(0000) GS:ffff880063200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fc4f143b000 CR3: 0000002065720000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020)
Stack:
 ffff880063203c20 ffffffffa03266ac ffff880fe5781000 ffff880063203c80
<d> ffff880063203e10 ffffffffa031a423 ffff880063203c70 ffffffff81496d1d
<d> ffff88107fcd0340 ffff88102996d8c0 ffff88102996d8c0 ffff881062400020
Call Trace:
 <IRQ> 
 [<ffffffffa03266ac>] ? sctp_retransmit+0x1dc/0x1f0 [sctp]
 [<ffffffffa031a423>] ? sctp_do_sm+0xab3/0x1210 [sctp]
 [<ffffffff81496d1d>] ? ip_rcv_finish+0x12d/0x440
 [<ffffffff814972a5>] ? ip_rcv+0x275/0x350
 [<ffffffffa031aeb0>] ? sctp_generate_t3_rtx_event+0x0/0xd0 [sctp]
 [<ffffffffa031af31>] ? sctp_generate_t3_rtx_event+0x81/0xd0 [sctp]
 [<ffffffff81087e07>] ? run_timer_softirq+0x197/0x340
 [<ffffffff810b03c5>] ? tick_dev_program_event+0x65/0xc0
 [<ffffffff8107d901>] ? __do_softirq+0xc1/0x1e0
 [<ffffffff810b049a>] ? tick_program_event+0x2a/0x30
 [<ffffffff8100c38c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100fbd5>] ? do_softirq+0x65/0xa0
 [<ffffffff8107d7b5>] ? irq_exit+0x85/0x90
 [<ffffffff81533d5a>] ? smp_apic_timer_interrupt+0x4a/0x60
 [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
 <EOI> 
 [<ffffffff810166d7>] ? mwait_idle+0x77/0xd0
 [<ffffffff8153022a>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
 [<ffffffff8151075a>] ? rest_init+0x7a/0x80
 [<ffffffff81c29f8f>] ? start_kernel+0x424/0x430
 [<ffffffff81c2933a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81c29453>] ? x86_64_start_kernel+0x115/0x124
Code: 74 3d 48 8b 00 4c 39 c0 74 f8 8b 90 d4 00 00 00 83 fa 03 74 ed 48 85 c9 74 1d 8b b1 d4 00 00 00 4c 63 d2 45 0f b6 92 00 7f 33 a0 <4c> 63 ce 45 3a 91 00 7f 33 a0 76 bf 83 fa 02 48 89 c1 75 be 48 
crash> dis -l sctp_assoc_update_retran_path+0x6d
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1344
0xffffffffa031c64d <sctp_assoc_update_retran_path+109>: movslq %esi,%r9
net/sctp/associola.c
1335 static const u8 sctp_trans_state_to_prio_map[] = {   <---- step 2: array
1336         [SCTP_ACTIVE]   = 3,    /* best case */
1337         [SCTP_UNKNOWN]  = 2,
1338         [SCTP_PF]       = 1,
1339         [SCTP_INACTIVE] = 0,    /* worst case */
1340 };
1341 
1342 static u8 sctp_trans_score(const struct sctp_transport *trans)
1343 {
1344         return sctp_trans_state_to_prio_map[trans->state];   <---- step 1: hang indexing in this array. we won't hang in an array so we're probably looping here
1345 }
1346 
1347 static struct sctp_transport *sctp_trans_elect_best(struct sctp_transport *curr,
1348                                                     struct sctp_transport *best)
1349 {
1350         if (best == NULL)
1351                 return curr;
1352 
1353         return sctp_trans_score(curr) > sctp_trans_score(best) ? curr : best;    <---- step 3: caller
1354 }
1355 
1356 void sctp_assoc_update_retran_path(struct sctp_association *asoc)
1357 {                        
1358         struct sctp_transport *trans = asoc->peer.retran_path;
1359         struct sctp_transport *trans_next = NULL;
1360 
1361         /* We're done as we only have the one and only path. */
1362         if (asoc->peer.transport_count == 1)
1363                 return;
1364         /* If active_path and retran_path are the same and active, 
1365          * then this is the only active path. Use it.
1366          */                                
1367         if (asoc->peer.active_path == asoc->peer.retran_path &&
1368             asoc->peer.active_path->state == SCTP_ACTIVE)
1369                 return;
1370 
1371         /* Iterate from retran_path's successor back to retran_path. */
1372         for (trans = list_next_entry(trans, transports); 1;
1373              trans = list_next_entry(trans, transports)) {
1374                 /* Manually skip the head element. */ 
1375                 if (&trans->transports == &asoc->peer.transport_addr_list)
1376                         continue;
1377                 if (trans->state == SCTP_UNCONFIRMED)
1378                         continue;
1379                 trans_next = sctp_trans_elect_best(trans, trans_next);    <--- step 4: caller

net/sctp/outqueue.c
 479 /* Mark all the eligible packets on a transport for retransmission and force
 480  * one packet out.
 481  */     
 482 void sctp_retransmit(struct sctp_outq *q, struct sctp_transport *transport,
 483                      sctp_retransmit_reason_t reason)
 484 {
 485         int error = 0;
 486         
 487         switch(reason) {
 488         case SCTP_RTXR_T3_RTX:
 489                 SCTP_INC_STATS(SCTP_MIB_T3_RETRANSMITS);
 490                 sctp_transport_lower_cwnd(transport, SCTP_LOWER_CWND_T3_RTX);
 491                 /* Update the retran path if the T3-rtx timer has expired for
 492                  * the current retran path. 
 493                  */
 494                 if (transport == transport->asoc->peer.retran_path)
 495                         sctp_assoc_update_retran_path(transport->asoc);    <---- step 5: caller
 496                 transport->asoc->rtx_data_chunks +=
 497                         transport->asoc->unack_data;
 498                 break;

SCTP_RTXR_T3_RTX means the packet is being retransmitted because the RTO is in use:

crash> dis -lr sctp_assoc_update_retran_path+0x6d
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1357
0xffffffffa031c5e0 <sctp_assoc_update_retran_path>:     push   %rbp
0xffffffffa031c5e1 <sctp_assoc_update_retran_path+1>:   mov    %rsp,%rbp
0xffffffffa031c5e4 <sctp_assoc_update_retran_path+4>:   nopl   0x0(%rax,%rax,1)
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1362
0xffffffffa031c5e9 <sctp_assoc_update_retran_path+9>:   cmpw   $0x1,0x158(%rdi)
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1358
0xffffffffa031c5f1 <sctp_assoc_update_retran_path+17>:  mov    0x190(%rdi),%r11
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1362
0xffffffffa031c5f8 <sctp_assoc_update_retran_path+24>:  je     0xffffffffa031c668 <sctp_assoc_update_retran_path+136>
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1367
0xffffffffa031c5fa <sctp_assoc_update_retran_path+26>:  cmp    0x188(%rdi),%r11
0xffffffffa031c601 <sctp_assoc_update_retran_path+33>:  je     0xffffffffa031c66a <sctp_assoc_update_retran_path+138>
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1372
0xffffffffa031c603 <sctp_assoc_update_retran_path+35>:  mov    (%r11),%rax
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1375
0xffffffffa031c606 <sctp_assoc_update_retran_path+38>:  lea    0x148(%rdi),%r8
0xffffffffa031c60d <sctp_assoc_update_retran_path+45>:  xor    %ecx,%ecx
0xffffffffa031c60f <sctp_assoc_update_retran_path+47>:  jmp    0xffffffffa031c627 <sctp_assoc_update_retran_path+71>
0xffffffffa031c611 <sctp_assoc_update_retran_path+49>:  nopl   0x0(%rax)
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1353
0xffffffffa031c618 <sctp_assoc_update_retran_path+56>:  mov    %esi,%edx
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1381
0xffffffffa031c61a <sctp_assoc_update_retran_path+58>:  cmp    $0x2,%edx
0xffffffffa031c61d <sctp_assoc_update_retran_path+61>:  je     0xffffffffa031c661 <sctp_assoc_update_retran_path+129>
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1384
0xffffffffa031c61f <sctp_assoc_update_retran_path+63>:  cmp    %r11,%rax
0xffffffffa031c622 <sctp_assoc_update_retran_path+66>:  je     0xffffffffa031c661 <sctp_assoc_update_retran_path+129>
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1373
0xffffffffa031c624 <sctp_assoc_update_retran_path+68>:  mov    (%rax),%rax
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1375
0xffffffffa031c627 <sctp_assoc_update_retran_path+71>:  cmp    %r8,%rax
0xffffffffa031c62a <sctp_assoc_update_retran_path+74>:  je     0xffffffffa031c624 <sctp_assoc_update_retran_path+68>
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1377
0xffffffffa031c62c <sctp_assoc_update_retran_path+76>:  mov    0xd4(%rax),%edx
0xffffffffa031c632 <sctp_assoc_update_retran_path+82>:  cmp    $0x3,%edx
0xffffffffa031c635 <sctp_assoc_update_retran_path+85>:  je     0xffffffffa031c624 <sctp_assoc_update_retran_path+68>
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1350
0xffffffffa031c637 <sctp_assoc_update_retran_path+87>:  test   %rcx,%rcx
0xffffffffa031c63a <sctp_assoc_update_retran_path+90>:  je     0xffffffffa031c659 <sctp_assoc_update_retran_path+121>
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1344
0xffffffffa031c63c <sctp_assoc_update_retran_path+92>:  mov    0xd4(%rcx),%esi
0xffffffffa031c642 <sctp_assoc_update_retran_path+98>:  movslq %edx,%r10
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1353
0xffffffffa031c645 <sctp_assoc_update_retran_path+101>: movzbl -0x5fcc8100(%r10),%r10d
/usr/src/debug/kernel-2.6.32-504.30.3.el6/linux-2.6.32-504.30.3.el6.x86_64/net/sctp/associola.c: 1344
0xffffffffa031c64d <sctp_assoc_update_retran_path+109>: movslq %esi,%r9
crash> dis -lr sctp_assoc_update_retran_path+0x6d | egrep "di$"
crash> 

%rdi is not modified over the runtime of the hung function, so the assoc argument is still in %rdi

RDI: ffff880fe5781000

crash> struct sctp_association ffff880fe5781000

crash> struct sctp_association.peer.transport_count,peer.active_path,peer.retran_path ffff880fe5781000
  peer.transport_count = 2,
  peer.active_path = 0xffff88103feec800,
  peer.retran_path = 0xffff88103feec800,

crash> struct sctp_transport 0xffff88103feec800
struct sctp_transport {
  transports = {
    next = 0xffff880fe5781148, 
    prev = 0xffff88106511b800
  }, 

crash> struct sctp_transport.state 0xffff88103feec800
  state = 3

enum sctp_spinfo_state {
        SCTP_INACTIVE,     // 0  these C99 comments added for analysis
        SCTP_PF,           // 1
        SCTP_ACTIVE,       // 2
        SCTP_UNCONFIRMED,  // 3
        SCTP_UNKNOWN = 0xffff  /* Value used for transport state unknown */
};

We're in SCTP_UNCONFIRMED so we try to find a successor:

crash> struct sctp_transport.transports,state 0xffff880fe5781148
  transports = {
    next = 0xffff88106511b800, 
    prev = 0xffff88103feec800
  }
  state = 0

crash> struct sctp_transport.transports,state 0xffff88106511b800
  transports = {
    next = 0xffff88103feec800, 
    prev = 0xffff880fe5781148
  }
  state = 0

There are no suitable successors, so we spin forever

Upstream has a tie breaker here, this from Linux v4.3:

1227 static struct sctp_transport *sctp_trans_elect_tie(struct sctp_transport *trans1,
1228                                                    struct sctp_transport *trans2)
1229 {
1230         if (trans1->error_count > trans2->error_count) {
1231                 return trans2;
1232         } else if (trans1->error_count == trans2->error_count &&
1233                    ktime_after(trans2->last_time_heard,
1234                                trans1->last_time_heard)) {
1235                 return trans2;
1236         } else {
1237                 return trans1;
1238         }
1239 }
1240 
1241 static struct sctp_transport *sctp_trans_elect_best(struct sctp_transport *curr,
1242                                                     struct sctp_transport *best)
1243 {
1244         u8 score_curr, score_best;
1245 
1246         if (best == NULL || curr == best)
1247                 return curr;
1248 
1249         score_curr = sctp_trans_score(curr);
1250         score_best = sctp_trans_score(best);
1251 
1252         /* First, try a score-based selection if both transport states
1253          * differ. If we're in a tie, lets try to make a more clever
1254          * decision here based on error counts and last time heard.
1255          */
1256         if (score_curr > score_best)
1257                 return curr;
1258         else if (score_curr == score_best)
1259                 return sctp_trans_elect_tie(curr, best);
1260         else
1261                 return best;
1262 }

This was added with:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=a7288c4

This is now in RHEL6.7 and newer:

* Thu Jan 29 2015 Rafael Aquini <aquini@redhat.com> [2.6.32-527.el6]
- [net] sctp: improve sctp_select_active_and_retran_path selection (Daniel Borkmann) [1090561]

This is also now in RHEL6.6 as of kernel-2.6.32-504.46.1.el6:

* Thu Feb 18 2016 Radomir Vrbovsky <rvrbovsk@redhat.com> [2.6.32-504.44.1.el6]
...
- [net] sctp: improve sctp_select_active_and_retran_path selection (Daniel Borkmann) [1306565 1090561]

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments