Process utimes (user space time) are incorrect or crash on divide_error in thread_group_times on RHEL6

Solution Verified - Updated -

Issue

  • A single cause can have three quite different symptoms.

a) A process's "utime" (shown as the time field in ps) shows the process to be running for a very long time, when this is not the case.

  • For example:
$ ps
UID        PID  PPID  C STIME TTY          TIME CMD
root    50925 50923 99 Aug06 ?        213555-20:29:38 test_application
  • The TIME field corresponds to field 14 in the /proc/PID/stat field, which is the "utime" value as reported by the kernel.
$ cat /proc/50925/stat
40275 (test_application) S 40273 36692 35927 0 -1 4202496 399678594 489450 31722 0 974453227234
870691207771 39 6381 20 0 17 0 116189537 2022604800 14289 18446744073709551615 4194304 20071776
140734177595376 1407341775887 52 238159815987 0 0 6 8414376 18446744073709551615 0 0 17 0 0 0 3419 0 0

b) The system unexpectedly crashes in thread_group_times().

  • For example:
divide error: 0000 [#1] SMP 
...
CPU 1 
Modules linked in: ipv6 power_meter hpilo hpwdt ipmi_si ipmi_msghandler sg microcode serio_raw iTCO_wdt iTCO_vendor_support bnx2 i7core_edac edac_core shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 2484, comm: XXXXX Not tainted 2.6.32-279.el6.x86_64 #1 HP ProLiant DL380 G6
RIP: 0010:[<ffffffff81058226>]  [<ffffffff81058226>] thread_group_times+0x56/0xb0
RSP: 0018:ffff880414ec9eb8  EFLAGS: 00010046
RAX: 943882ab68e17428 RBX: ffff880415e5f000 RCX: 00000000ff6373b8
RDX: 0000000000000000 RSI: ffff880414ec9eb8 RDI: 000f38eb448809cb
RBP: ffff880414ec9ef8 R08: 0000000094935de3 R09: ffff880415e5f000
R10: 0000189800000000 R11: 0000000000000202 R12: ffff880414ec9f08
R13: ffff880414ec9f10 R14: 0000000000000000 R15: 0000000000000006
FS:  00007f911dbf0700(0000) GS:ffff88042e400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f691244e330 CR3: 000000081489d000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process XXXXX (pid: 2484, threadinfo ffff880414ec8000, task ffff880414e04080)
Stack:
 ...
Call Trace:
 [<ffffffff8108979d>] do_sys_times+0x4d/0xd0
 [<ffffffff81089864>] sys_times+0x44/0x70
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: 49 89 d4 e8 7d b4 03 00 4c 8b 75 c8 4c 03 75 c0 48 8b 7d d0 e8 6c 9a 01 00 4d 85 f6 48 89 c1 74 0d 48 0f af 45 c0 45 89 f6 31 d2 <49> f7 f6 4c 8b b3 d8 03 00 00 48 8b 93 e0 03 00 00 4c 39 f0 49 
RIP  [<ffffffff81058226>] thread_group_times+0x56/0xb0
 RSP <ffff880414ec9eb8>

c) The system unexpectedly crashes in task_times().

  • For example:
crash> log
divide error: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cpufreq/scaling_setspeed
CPU 2 
Modules linked in: bluetooth rfkill bonding ipv6 bridge stp llc nfs lockd fscache nfs_acl auth_rpcgss autofs4 sunrpc pcc_cpufreq power_meter
sg microcode serio_raw iTCO_wdt iTCO_vendor_support hpilo hpwdt bnx2 i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif
hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipv6]
Pid: 3177, comm: splunkd Tainted: G        W  ----------------   2.6.32-220.13.1.el6.x86_64 #1 HP ProLiant DL380 G7
RIP: 0010:[<ffffffff810569d8>]  [<ffffffff810569d8>] task_times+0x58/0xb0
RSP: 0018:ffff880c0f65be38  EFLAGS: 00010246
RAX: e54ba6d1fdf60eda RBX: ffff8808a1745500 RCX: 00000001001ef539
RDX: 0000000000000000 RSI: ffff880c0f65bf18 RDI: 000f4418615082a5
RBP: ffff880c0f65be68 R08: 0000000000000001 R09: 00000000000007de
R10: 000000000000000d R11: 0000000000000202 R12: ffff880c0f65bf10
R13: ffff880c0f65bf18 R14: 0000000000000000 R15: 00000000e52fefaa
FS:  00007f334c7fd700(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f331ce05000 CR3: 0000000c0dbef000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process splunkd (pid: 3177, threadinfo ffff880c0f65a000, task ffff8808a1745500)
Stack:
 ffff880800000000 ffff8808a1745500 ffff880c0f65be78 0000000000000001
<0> 00007f334bc2e568 00007f334bc2e548 ffff880c0f65bf68 ffffffff81087573
<0> 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff81087573>] getrusage+0x83/0x340
 [<ffffffff81087866>] sys_getrusage+0x36/0x40
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Code: f5 49 89 d4 4d 89 fe 4c 03 b7 88 05 00 00 48 8b bf 80 00 00 00 e8 b9 95 01 00 4d 85 f6 48 89 c1 74 0c 49 0f af c7 45 89 f6 31 d2
<49> f7 f6 4c 8b b3 a8 05 00 00 48 8b 93 b0 05 00 00 4c 8b 7d f8 
RIP  [<ffffffff810569d8>] task_times+0x58/0xb0
 RSP <ffff880c0f65be38>

crash> bt
PID: 3177   TASK: ffff8808a1745500  CPU: 2   COMMAND: "splunkd"
 #0 [ffff880c0f65baf0] machine_kexec at ffffffff8103214b
 #1 [ffff880c0f65bb50] crash_kexec at ffffffff810b90c2
 #2 [ffff880c0f65bc20] oops_end at ffffffff814f09b0
 #3 [ffff880c0f65bc50] die at ffffffff8100f26b
 #4 [ffff880c0f65bc80] do_trap at ffffffff814f02a4
 #5 [ffff880c0f65bce0] do_divide_error at ffffffff8100cfff
 #6 [ffff880c0f65bd80] divide_error at ffffffff8100be7b
    [exception RIP: task_times+0x58]
    RIP: ffffffff810569d8  RSP: ffff880c0f65be38  RFLAGS: 00010246
    RAX: e54ba6d1fdf60eda  RBX: ffff8808a1745500  RCX: 00000001001ef539
    RDX: 0000000000000000  RSI: ffff880c0f65bf18  RDI: 000f4418615082a5
    RBP: ffff880c0f65be68   R8: 0000000000000001   R9: 00000000000007de
    R10: 000000000000000d  R11: 0000000000000202  R12: ffff880c0f65bf10
    R13: ffff880c0f65bf18  R14: 0000000000000000  R15: 00000000e52fefaa
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880c0f65be70] getrusage at ffffffff81087573
 #8 [ffff880c0f65bf70] sys_getrusage at ffffffff81087866
 #9 [ffff880c0f65bf80] system_call_fastpath at ffffffff8100b0f2

Environment

  • Red Hat Enterprise Linux 6, 64 bit
  • 2.6.32-220.11.1.el6 - 2.6.32-220.26.1.el6
  • 2.6.32-279.el6 - 2.6.32-279.10.1.el6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content