Process utimes (user space time) are incorrect or crash on divide_error in thread_group_times on RHEL6
Issue
- A single cause can have three quite different symptoms.
a) A process's "utime" (shown as the time field in ps) shows the process to be running for a very long time, when this is not the case.
- For example:
$ ps
UID PID PPID C STIME TTY TIME CMD
root 50925 50923 99 Aug06 ? 213555-20:29:38 test_application
- The TIME field corresponds to field 14 in the
/proc/PID/statfield, which is the "utime" value as reported by the kernel.
$ cat /proc/50925/stat
40275 (test_application) S 40273 36692 35927 0 -1 4202496 399678594 489450 31722 0 974453227234
870691207771 39 6381 20 0 17 0 116189537 2022604800 14289 18446744073709551615 4194304 20071776
140734177595376 1407341775887 52 238159815987 0 0 6 8414376 18446744073709551615 0 0 17 0 0 0 3419 0 0
b) The system unexpectedly crashes in thread_group_times().
- For example:
divide error: 0000 [#1] SMP
...
CPU 1
Modules linked in: ipv6 power_meter hpilo hpwdt ipmi_si ipmi_msghandler sg microcode serio_raw iTCO_wdt iTCO_vendor_support bnx2 i7core_edac edac_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 2484, comm: XXXXX Not tainted 2.6.32-279.el6.x86_64 #1 HP ProLiant DL380 G6
RIP: 0010:[<ffffffff81058226>] [<ffffffff81058226>] thread_group_times+0x56/0xb0
RSP: 0018:ffff880414ec9eb8 EFLAGS: 00010046
RAX: 943882ab68e17428 RBX: ffff880415e5f000 RCX: 00000000ff6373b8
RDX: 0000000000000000 RSI: ffff880414ec9eb8 RDI: 000f38eb448809cb
RBP: ffff880414ec9ef8 R08: 0000000094935de3 R09: ffff880415e5f000
R10: 0000189800000000 R11: 0000000000000202 R12: ffff880414ec9f08
R13: ffff880414ec9f10 R14: 0000000000000000 R15: 0000000000000006
FS: 00007f911dbf0700(0000) GS:ffff88042e400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f691244e330 CR3: 000000081489d000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process XXXXX (pid: 2484, threadinfo ffff880414ec8000, task ffff880414e04080)
Stack:
...
Call Trace:
[<ffffffff8108979d>] do_sys_times+0x4d/0xd0
[<ffffffff81089864>] sys_times+0x44/0x70
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: 49 89 d4 e8 7d b4 03 00 4c 8b 75 c8 4c 03 75 c0 48 8b 7d d0 e8 6c 9a 01 00 4d 85 f6 48 89 c1 74 0d 48 0f af 45 c0 45 89 f6 31 d2 <49> f7 f6 4c 8b b3 d8 03 00 00 48 8b 93 e0 03 00 00 4c 39 f0 49
RIP [<ffffffff81058226>] thread_group_times+0x56/0xb0
RSP <ffff880414ec9eb8>
c) The system unexpectedly crashes in task_times().
- For example:
crash> log
divide error: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu15/cpufreq/scaling_setspeed
CPU 2
Modules linked in: bluetooth rfkill bonding ipv6 bridge stp llc nfs lockd fscache nfs_acl auth_rpcgss autofs4 sunrpc pcc_cpufreq power_meter
sg microcode serio_raw iTCO_wdt iTCO_vendor_support hpilo hpwdt bnx2 i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif
hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipv6]
Pid: 3177, comm: splunkd Tainted: G W ---------------- 2.6.32-220.13.1.el6.x86_64 #1 HP ProLiant DL380 G7
RIP: 0010:[<ffffffff810569d8>] [<ffffffff810569d8>] task_times+0x58/0xb0
RSP: 0018:ffff880c0f65be38 EFLAGS: 00010246
RAX: e54ba6d1fdf60eda RBX: ffff8808a1745500 RCX: 00000001001ef539
RDX: 0000000000000000 RSI: ffff880c0f65bf18 RDI: 000f4418615082a5
RBP: ffff880c0f65be68 R08: 0000000000000001 R09: 00000000000007de
R10: 000000000000000d R11: 0000000000000202 R12: ffff880c0f65bf10
R13: ffff880c0f65bf18 R14: 0000000000000000 R15: 00000000e52fefaa
FS: 00007f334c7fd700(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f331ce05000 CR3: 0000000c0dbef000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process splunkd (pid: 3177, threadinfo ffff880c0f65a000, task ffff8808a1745500)
Stack:
ffff880800000000 ffff8808a1745500 ffff880c0f65be78 0000000000000001
<0> 00007f334bc2e568 00007f334bc2e548 ffff880c0f65bf68 ffffffff81087573
<0> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffff81087573>] getrusage+0x83/0x340
[<ffffffff81087866>] sys_getrusage+0x36/0x40
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: f5 49 89 d4 4d 89 fe 4c 03 b7 88 05 00 00 48 8b bf 80 00 00 00 e8 b9 95 01 00 4d 85 f6 48 89 c1 74 0c 49 0f af c7 45 89 f6 31 d2
<49> f7 f6 4c 8b b3 a8 05 00 00 48 8b 93 b0 05 00 00 4c 8b 7d f8
RIP [<ffffffff810569d8>] task_times+0x58/0xb0
RSP <ffff880c0f65be38>
crash> bt
PID: 3177 TASK: ffff8808a1745500 CPU: 2 COMMAND: "splunkd"
#0 [ffff880c0f65baf0] machine_kexec at ffffffff8103214b
#1 [ffff880c0f65bb50] crash_kexec at ffffffff810b90c2
#2 [ffff880c0f65bc20] oops_end at ffffffff814f09b0
#3 [ffff880c0f65bc50] die at ffffffff8100f26b
#4 [ffff880c0f65bc80] do_trap at ffffffff814f02a4
#5 [ffff880c0f65bce0] do_divide_error at ffffffff8100cfff
#6 [ffff880c0f65bd80] divide_error at ffffffff8100be7b
[exception RIP: task_times+0x58]
RIP: ffffffff810569d8 RSP: ffff880c0f65be38 RFLAGS: 00010246
RAX: e54ba6d1fdf60eda RBX: ffff8808a1745500 RCX: 00000001001ef539
RDX: 0000000000000000 RSI: ffff880c0f65bf18 RDI: 000f4418615082a5
RBP: ffff880c0f65be68 R8: 0000000000000001 R9: 00000000000007de
R10: 000000000000000d R11: 0000000000000202 R12: ffff880c0f65bf10
R13: ffff880c0f65bf18 R14: 0000000000000000 R15: 00000000e52fefaa
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880c0f65be70] getrusage at ffffffff81087573
#8 [ffff880c0f65bf70] sys_getrusage at ffffffff81087866
#9 [ffff880c0f65bf80] system_call_fastpath at ffffffff8100b0f2
Environment
- Red Hat Enterprise Linux 6, 64 bit
- 2.6.32-220.11.1.el6 - 2.6.32-220.26.1.el6
- 2.6.32-279.el6 - 2.6.32-279.10.1.el6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
