Files in /proc get absurd values for CPU times utime and ctime and they stop updating

Solution Verified - Updated -

Issue

Now, the sys cpu counters on the busy threads have gone nuts and have stopped updating.

top -H is showing individual threads taking over 100% of a CPU. Not possible in the real world.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+   P COMMAND
12573 gfadm     20   0 6188m 590m 5492 R 287.2  1.2  5125314h  2 proxy
12572 gfadm     20   0 6188m 590m 5492 R 268.6  1.2  5125314h 10 proxy
12571 gfadm     20   0 6188m 590m 5492 R 251.7  1.2  5125314h  5 proxy
12574 gfadm     20   0 6188m 590m 5492 R 239.0  1.2  5125314h  7 proxy
5 million CPU hours each.

sys counters absurd, and not changing.

[root@lb0007 /tmp]# cpus t | grep proxy | tail -5; sleep 10; cpus t | grep proxy | tail -5
12571 proxy           R    5   5 usr 8533485.19 sys 18442598389.74
12572 proxy           R   10   10 usr 8517070.73 sys 18442614863.64
12573 proxy           R    2   2 usr 8498943.88 sys 18442633094.22
12574 proxy           R    7   7 usr 8545815.13 sys 18442586001.35

12571 proxy           R    5   5 usr 8533510.37 sys 18442598389.74
12572 proxy           R   10   10 usr 8517097.65 sys 18442614863.64
12573 proxy           R    2   2 usr 8498972.66 sys 18442633094.22
12574 proxy           R    7   7 usr 8545839.10 sys 18442586001.35

It’s not a top problem, it’s the OS. When you look at any thread’s /proc/pid/task/pid/stat file you get the correct CPU time in centiseconds.

user/sys for busy thread 12572
awk ‘{ print $14,$15 }’ /proc/12572/task/12572/stat
413172908 96095
user             sys

About 47 days (26ish Nov.)

But when you look at the “main’s” /proc/pid/stat file, which should be the sum of all threads in the process you get the ludicrous numbers.

[root@lb0007 /tmp]# pgrep proxy
12496
[root@lb0007 /tmp]# awk '{ print $14,$15 }' /proc/12496/stat
1622975991394 511506043663
user                       sys

There’s where the 5.9 million CPU hours is coming from.

[root@lb0007 /tmp]# echo 1622975991394  + 511506043663|bc
2134482035057
[root@lb0007 /tmp]# echo 2134482035057 / 100 / 3600 | bc
5929116

lb0007 RHEL6.2. Dell R610. top shows things if run with -H option, but is way off if you don’t. Not a showstopper of course – worth a look to the curious.

Just to be safe I checked if it was running with any custom settings.

[root@lb0007 ~]# strace -s128 top 2>&1 | grep top
execve("/usr/bin/top", ["top"], [/* 33 vars */]) = 0
open("/etc/toprc", O_RDONLY)            = -1 ENOENT (No such file or directory)
open("/root/.toprc", O_RDONLY)          = -1 ENOENT (No such file or directory)

Nope.

We have a box with 12 cores and 4 hot threads running, uptime and sar show this

[root@lb0007 ~]# uptime
 10:54:52 up 104 days, 28 min,  1 user,  load average: 4.00, 4.00, 4.00   Load of 4 confirms.

[root@lb0007 ~]# sar | tail
10:45:01 AM     all     33.35      0.00      0.04      0.06      0.00     66.55   2/3 idle all the time
10:46:01 AM     all     33.28      0.00      0.10      0.05      0.00     66.57
10:47:01 AM     all     33.35      0.00      0.05      0.06      0.00     66.54
10:48:01 AM     all     33.35      0.00      0.06      0.04      0.00     66.55
10:49:01 AM     all     33.34      0.00      0.04      0.06      0.00     66.57
10:50:01 AM     all     33.17      0.00      0.14      0.06      0.00     66.63
10:51:01 AM     all     33.43      0.00      0.23      0.05      0.00     66.29
10:52:02 AM     all     33.38      0.00      0.12      0.06      0.00     66.44
10:53:01 AM     all     33.42      0.00      0.02      0.08      0.00     66.48
Average:        all     33.33      0.00      0.03      0.06      0.00     66.58

Run top

Tasks: 271 total,   1 running, 270 sleeping,   0 stopped,   0 zombie
Cpu(s): 31.2%us,  0.0%sy,  0.0%ni, 68.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st  Shows 32% in user, but no proxy appears.
Mem:  49411996k total,  9703516k used, 39708480k free,   222868k buffers
Swap:        0k total,        0k used,        0k free,  7251700k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      20   0 23576 1644 1300 S  0.0  0.0   0:08.27 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:03.70 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0  23:05.70 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    6 root      RT   0     0    0    0 S  0.0  0.0   0:03.24 watchdog/0
...

This process has been up since 26 November, but these numbers are outrageous.

gfadm    12496     1 99  2012 ?        247046-12:45:50 /cme/gfix/proxy/bin/proxy -c /cme/gfix/proxy/cfg/Proxy.properties

247046 days of accrued CPU? What?

top shows an absurd value as well.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12496 gfadm     20   0 6188m 589m 5492 S  0.0  1.2  5929116h proxy
20678 gfadm     20   0 4573m 458m  11m S  0.0  0.9  71:08.69 java

4 hot threads would have to be running 1029 days to reach these numbers.

cpus shows the amount of accrued cpu per thread in seconds.

cpus t | grep prox
12571 proxy           R    5   5 usr 4129978.71 sys 606.83
12572 proxy           R   10   10 usr 4129605.38 sys 960.93
12573 proxy           R    2   2 usr 4129239.84 sys 1341.47
12574 proxy           R    7   7 usr 4130228.97 sys 358.01

I think something went way wrong on this box on Jan 9 or 10. Hundreds of /proc/files have not changed timestamps since

...
 rw------. 1 root root 0 Jan 10 11:50 /proc/3642/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/3644/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/3663/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/36/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/37/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/38/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/39/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/3/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/409/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/40/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/41/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/42/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/43/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/44/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/45/limits
 rw------. 1 root root 0 Jan 10 11:50 /proc/46/limits
 ...

Environment

  • Red Hat Enterprise Linux 6.2
  • 2.6.32-220.23.1.el6.x86_64

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.