MRG RT kernel 2.6.33.9.rt31.64.el5rt crashes in find_busiest_group routine

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise MRG Realtime 1.3
  • Kernel 2.6.33.9.rt31.64.el5rt or older

Issue

  • Red Hat Enterprise MRG RT kernel 2.6.33.9.rt31.64.el5rt crashes in find_busiest_group routine
divide error: 0000 [#1] PREEMPT SMP 
last sysfs file: /sys/devices/pci0000:fe/0000:fe:06.3/class
CPU 4 
Pid: 64, comm: sirq-sched/4 Not tainted 2.6.33.9-rt31.64.el5rt #1 0F0XJ6/PowerEdge R610
RIP: 0010:[<ffffffff8103a5fd>]  [<ffffffff8103a5fd>] find_busiest_group+0x375/0x73b
[...]
RIP  [<ffffffff8103a5fd>] find_busiest_group+0x375/0x73b RSP <ffff88062e65fc30>

Resolution

  • This was fixed on errata RHBA-2013-0927
  • Update to kernel-rt-2.6.33.9-rt31.86.el5rt or newer

Root Cause

Prior to this update, the find_busiest_group() function used sched_group->cpu_power in the denominator of a fraction with a value of 0. Consequently, a kernel panic occurred. Code changes in the hotfix build prevent the divide by zero in the kernel and the panic no longer occurs.

This issue is an analogue to the issue with the regular (non-realtime) kernel discussed in Divide-by-zero in find_busiest_group().

Diagnostic Steps

From the kernel crash output

divide error: 0000 [#1] PREEMPT SMP 
last sysfs file: /sys/devices/pci0000:fe/0000:fe:06.3/class
CPU 4 
Pid: 64, comm: sirq-sched/4 Not tainted 2.6.33.9-rt31.64.el5rt #1 0F0XJ6/PowerEdge R610
RIP: 0010:[<ffffffff8103a5fd>]  [<ffffffff8103a5fd>] find_busiest_group+0x375/0x73b
RSP: 0018:ffff88062e65fc30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000010301
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000040
RBP: ffff88062e65fd80 R08: 0000000000000000 R09: ffff88033ac4b0a8
R10: ffff88062e643d90 R11: ffff88062e643d50 R12: 0000000000000040
R13: ffff88033ac4b090 R14: ffff88033ac4af80 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffff88033ac40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000001c16d70 CR3: 000000060d1f2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sirq-sched/4 (pid: 64, threadinfo ffff88062e65e000, task ffff88062e65c780)
Stack:
 ffff88033ac4af68 ffff88062e65fddc 00000000283703c0 ffff88062e65fdd0
<0> 000000002e65fdb0 0000000400000000 ffff88033ac4b0a0 000000002e65fe04
<0> ffff88033ac4b090 ffffffff2e65fdf8 0000000000000000 ffffffffffffffff
Call Trace:
 [<ffffffff8103f0e1>] rebalance_domains+0x16f/0x418
 [<ffffffff8103f3ca>] run_rebalance_domains+0x40/0xc6
 [<ffffffff81048bdc>] run_ksoftirqd+0x17e/0x29e
 [<ffffffff81048a5e>] ? run_ksoftirqd+0x0/0x29e
 [<ffffffff8105db79>] kthread+0x6e/0x76
 [<ffffffff81003a94>] kernel_thread_helper+0x4/0x10
 [<ffffffff8105db0b>] ? kthread+0x0/0x76
 [<ffffffff81003a90>] ? kernel_thread_helper+0x0/0x10
Code: bd dc fe ff ff 74 13 48 83 7d 10 00 74 0c 48 8b 5d 10 c7 03 00 00 00 00 eb 6b 41 8b 55 08 48 8b 45 a8 48 c1 e0 0a 48 89 d3 31 d2 <48> f7 f3 48 8b 55 b0 48 89 45 a0 31 c0 48 85 d2 74 0c 48 8b 45 
RIP  [<ffffffff8103a5fd>] find_busiest_group+0x375/0x73b RSP <ffff88062e65fc30>

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.