Process utimes (user space time) are incorrect or crash on divide_error in thread_group_times on RHEL6
Issue
- A single cause can have three quite different symptoms.
a) A process's "utime" (shown as the time field in ps) shows the process to be running for a very long time, when this is not the case.
- For example:
$ ps
UID PID PPID C STIME TTY TIME CMD
root 50925 50923 99 Aug06 ? 213555-20:29:38 test_application
- The TIME field corresponds to field 14 in the
/proc/PID/stat
field, which is the "utime" value as reported by the kernel.
$ cat /proc/50925/stat
40275 (test_application) S 40273 36692 35927 0 -1 4202496 399678594 489450 31722 0 974453227234
870691207771 39 6381 20 0 17 0 116189537 2022604800 14289 18446744073709551615 4194304 20071776
140734177595376 1407341775887 52 238159815987 0 0 6 8414376 18446744073709551615 0 0 17 0 0 0 3419 0 0
b) The system unexpectedly crashes in thread_group_times()
.
- For example:
divide error: 0000 [#1] SMP
...
CPU 1
Modules linked in: ipv6 power_meter hpilo hpwdt ipmi_si ipmi_msghandler sg microcode serio_raw iTCO_wdt iTCO_vendor_support bnx2 i7core_edac edac_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 2484, comm: XXXXX Not tainted 2.6.32-279.el6.x86_64 #1 HP ProLiant DL380 G6
RIP: 0010:[<ffffffff81058226>] [<ffffffff81058226>] thread_group_times+0x56/0xb0
RSP: 0018:ffff880414ec9eb8 EFLAGS: 00010046
RAX: 943882ab68e17428 RBX: ffff880415e5f000 RCX: 00000000ff6373b8
RDX: 0000000000000000 RSI: ffff880414ec9eb8 RDI: 000f38eb448809cb
RBP: ffff880414ec9ef8 R08: 0000000094935de3 R09: ffff880415e5f000
R10: 0000189800000000 R11: 0000000000000202 R12: ffff880414ec9f08
R13: ffff880414ec9f10 R14: 0000000000000000 R15: 0000000000000006
FS: 00007f911dbf0700(0000) GS:ffff88042e400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f691244e330 CR3: 000000081489d000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process XXXXX (pid: 2484, threadinfo ffff880414ec8000, task ffff880414e04080)
Stack:
...
Call Trace:
[<ffffffff8108979d>] do_sys_times+0x4d/0xd0
[<ffffffff81089864>] sys_times+0x44/0x70
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: 49 89 d4 e8 7d b4 03 00 4c 8b 75 c8 4c 03 75 c0 48 8b 7d d0 e8 6c 9a 01 00 4d 85 f6 48 89 c1 74 0d 48 0f af 45 c0 45 89 f6 31 d2 <49> f7 f6 4c 8b b3 d8 03 00 00 48 8b 93 e0 03 00 00 4c 39 f0 49
RIP [<ffffffff81058226>] thread_group_times+0x56/0xb0
RSP <ffff880414ec9eb8>
c) The system unexpectedly crashes in task_times()
.
- For example:
crash> log
divide error: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu15/cpufreq/scaling_setspeed
CPU 2
Modules linked in: bluetooth rfkill bonding ipv6 bridge stp llc nfs lockd fscache nfs_acl auth_rpcgss autofs4 sunrpc pcc_cpufreq power_meter
sg microcode serio_raw iTCO_wdt iTCO_vendor_support hpilo hpwdt bnx2 i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif
hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipv6]
Pid: 3177, comm: splunkd Tainted: G W ---------------- 2.6.32-220.13.1.el6.x86_64 #1 HP ProLiant DL380 G7
RIP: 0010:[<ffffffff810569d8>] [<ffffffff810569d8>] task_times+0x58/0xb0
RSP: 0018:ffff880c0f65be38 EFLAGS: 00010246
RAX: e54ba6d1fdf60eda RBX: ffff8808a1745500 RCX: 00000001001ef539
RDX: 0000000000000000 RSI: ffff880c0f65bf18 RDI: 000f4418615082a5
RBP: ffff880c0f65be68 R08: 0000000000000001 R09: 00000000000007de
R10: 000000000000000d R11: 0000000000000202 R12: ffff880c0f65bf10
R13: ffff880c0f65bf18 R14: 0000000000000000 R15: 00000000e52fefaa
FS: 00007f334c7fd700(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f331ce05000 CR3: 0000000c0dbef000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process splunkd (pid: 3177, threadinfo ffff880c0f65a000, task ffff8808a1745500)
Stack:
ffff880800000000 ffff8808a1745500 ffff880c0f65be78 0000000000000001
<0> 00007f334bc2e568 00007f334bc2e548 ffff880c0f65bf68 ffffffff81087573
<0> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffff81087573>] getrusage+0x83/0x340
[<ffffffff81087866>] sys_getrusage+0x36/0x40
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: f5 49 89 d4 4d 89 fe 4c 03 b7 88 05 00 00 48 8b bf 80 00 00 00 e8 b9 95 01 00 4d 85 f6 48 89 c1 74 0c 49 0f af c7 45 89 f6 31 d2
<49> f7 f6 4c 8b b3 a8 05 00 00 48 8b 93 b0 05 00 00 4c 8b 7d f8
RIP [<ffffffff810569d8>] task_times+0x58/0xb0
RSP <ffff880c0f65be38>
crash> bt
PID: 3177 TASK: ffff8808a1745500 CPU: 2 COMMAND: "splunkd"
#0 [ffff880c0f65baf0] machine_kexec at ffffffff8103214b
#1 [ffff880c0f65bb50] crash_kexec at ffffffff810b90c2
#2 [ffff880c0f65bc20] oops_end at ffffffff814f09b0
#3 [ffff880c0f65bc50] die at ffffffff8100f26b
#4 [ffff880c0f65bc80] do_trap at ffffffff814f02a4
#5 [ffff880c0f65bce0] do_divide_error at ffffffff8100cfff
#6 [ffff880c0f65bd80] divide_error at ffffffff8100be7b
[exception RIP: task_times+0x58]
RIP: ffffffff810569d8 RSP: ffff880c0f65be38 RFLAGS: 00010246
RAX: e54ba6d1fdf60eda RBX: ffff8808a1745500 RCX: 00000001001ef539
RDX: 0000000000000000 RSI: ffff880c0f65bf18 RDI: 000f4418615082a5
RBP: ffff880c0f65be68 R8: 0000000000000001 R9: 00000000000007de
R10: 000000000000000d R11: 0000000000000202 R12: ffff880c0f65bf10
R13: ffff880c0f65bf18 R14: 0000000000000000 R15: 00000000e52fefaa
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880c0f65be70] getrusage at ffffffff81087573
#8 [ffff880c0f65bf70] sys_getrusage at ffffffff81087866
#9 [ffff880c0f65bf80] system_call_fastpath at ffffffff8100b0f2
Environment
- Red Hat Enterprise Linux 6, 64 bit
- 2.6.32-220.11.1.el6 - 2.6.32-220.26.1.el6
- 2.6.32-279.el6 - 2.6.32-279.10.1.el6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.