Top shows very high load average

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 6.4

Issue

  • Top shows very high load average:
top - 10:45:22 up 22 days, 18:45,  7 users,  load average: 124.03, 123.98, 122.22
Tasks: 619 total,   1 running, 616 sleeping,   1 stopped,   1 zombie
Cpu(s):  1.6%us,  1.3%sy,  0.0%ni, 96.9%id,  0.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  32880552k total, 31314624k used,  1565928k free,    20792k buffers
Swap:  8388600k total,    29300k used,  8359300k free, 29549364k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND   

Resolution

All the stack traces are going through third party mvfs module:

PID: 29424  TASK: ffff880455e64ae0  CPU: 7   COMMAND: "tcsh"
 #0 [ffff880432017968] schedule at ffffffff8150db42
 #1 [ffff880432017a30] schedule_timeout at ffffffff8150ea05
 #2 [ffff880432017ae0] __down at ffffffff8150f922
 #3 [ffff880432017b30] down at ffffffff8109ca61
 #4 [ffff880432017b60] mvfs_mythread at ffffffffa05a8bd6 [mvfs]
 #5 [ffff880432017c00] mvfs_linux_enter_fs at ffffffffa05d2b16 [mvfs]
 #6 [ffff880432017c10] mvfs_enter_fs at ffffffffa05a81f4 [mvfs]
 #7 [ffff880432017c30] mvfs_get_thread_ptr at ffffffffa05d0ebd [mvfs]
 #8 [ffff880432017c40] mdki_linux_init_call_data at ffffffffa05dcc54 [mvfs]
 #9 [ffff880432017c60] vnode_iop_permission at ffffffffa05d46ec [mvfs]
#10 [ffff880432017ca0] __link_path_walk at ffffffff8119061d
#11 [ffff880432017d60] path_walk at ffffffff8119182a
#12 [ffff880432017da0] do_path_lookup at ffffffff811919fb
#13 [ffff880432017dd0] do_filp_open at ffffffff8119293b
#14 [ffff880432017f20] do_sys_open at ffffffff8117df59
#15 [ffff880432017f70] sys_open at ffffffff8117e070
#16 [ffff880432017f80] system_call_fastpath at ffffffff8100b072
    RIP: 00000036faedb400  RSP: 00007fffaa7dd798  RFLAGS: 00010202
    RAX: 0000000000000002  RBX: ffffffff8100b072  RCX: 000000000000001a
    RDX: 0000000000000001  RSI: 0000000000090800  RDI: 0000000001aa7100
    RBP: 0000000000000002   R8: 0000000000000000   R9: 00007fffaa7dd768
    R10: 0000000000008028  R11: 0000000000000246  R12: ffffffff8117e070
    R13: ffff880432017f78  R14: 0000000000000002  R15: 0000000000000001
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

Please involve mvfs module vendor IBM to investigate further.

Diagnostic Steps

1) System status when it panicked:

     KERNEL: vmlinux                           
    DUMPFILE: 30-vmcore  [PARTIAL DUMP]
        CPUS: 8
        DATE: Mon Jan 27 05:03:26 2014
      UPTIME: 22 days, 19:03:47
LOAD AVERAGE: 125.07, 124.99, 124.06   <<---- Very huge load average. 
       TASKS: 672
    NODENAME: server1.example.com
     RELEASE: 2.6.32-358.6.2.el6.x86_64
     VERSION: #1 SMP Tue May 14 15:48:21 EDT 2013
     MACHINE: x86_64  (3158 Mhz)
      MEMORY: 32 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 9174
     COMMAND: "bash"
        TASK: ffff880795700040  [THREAD_INFO: ffff880473a60000]
         CPU: 4
       STATE: TASK_RUNNING (PANIC)

2) But no memory pressure at all:

              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM  8220138      31.4 GB         ----
      FREE   388858       1.5 GB    4% of TOTAL MEM
      USED  7831280      29.9 GB   95% of TOTAL MEM
    SHARED    54648     213.5 MB    0% of TOTAL MEM
   BUFFERS     5660      22.1 MB    0% of TOTAL MEM
    CACHED  7387532      28.2 GB   89% of TOTAL MEM
      SLAB   257780      1007 MB    3% of TOTAL MEM

TOTAL SWAP  2097150         8 GB         ----
 SWAP USED     7325      28.6 MB    0% of TOTAL SWAP
 SWAP FREE  2089825         8 GB   99% of TOTAL SWAP

3) System has been crashed by sysrq (Manual Crash)::

SysRq : Trigger a crash
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8133d956>] sysrq_handle_crash+0x16/0x20
PGD 426b17067 PUD 3a5e14067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/virtual/bdi/253:4/read_ahead_kb
CPU 4 
Modules linked in: bridge iptable_filter ip_tables mvfs(U) autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc bnx2fc cnic uio fcoe libfcoe 8021q libfc garp stp llc scsi_transport_fc scsi_tgt ipv6 bnx2 ses enclosure ibmpex ibmaem ics932s401 microcode serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support ioatdma dca i5000_edac edac_core i5k_amb shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom aacraid pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mvfs]

Pid: 9174, comm: bash Not tainted 2.6.32-358.6.2.el6.x86_64 #1 IBM IBM System x3550 -[7978B7G]-/System Planar
RIP: 0010:[<ffffffff8133d956>]  [<ffffffff8133d956>] sysrq_handle_crash+0x16/0x20
RSP: 0018:ffff880473a61e18  EFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000007894
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff880473a61e18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000002000 R12: 0000000000000000
R13: ffffffff81affea0 R14: 0000000000000286 R15: 0000000000000004
FS:  00007fd476207700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000076af4d000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 9174, threadinfo ffff880473a60000, task ffff880795700040)
Stack:
 ffff880473a61e68 ffffffff8133dc12 ffff88082fa73e80 ffff880500000000
<d> 0000000d80df4018 0000000000000002 ffff880831e940c0 00007fd47620e000
<d> 0000000000000002 fffffffffffffffb ffff880473a61e98 ffffffff8133dcce
Call Trace:
 [<ffffffff8133dc12>] __handle_sysrq+0x132/0x1a0
 [<ffffffff8133dcce>] write_sysrq_trigger+0x4e/0x50
 [<ffffffff811e962e>] proc_reg_write+0x7e/0xc0
 [<ffffffff81181078>] vfs_write+0xb8/0x1a0
 [<ffffffff81181971>] sys_write+0x51/0x90
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Code: d0 88 81 a3 db fd 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 7d d1 75 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 
RIP  [<ffffffff8133d956>] sysrq_handle_crash+0x16/0x20
 RSP <ffff880473a61e18>
CR2: 0000000000000000

4) The only reason for very high load average was no. of D state processes:

crash> ps | grep UN | wc -l
125

Most of them was tcsh processes.

5) Almost all the D state processes shows the same stack traces:

crash> bt 29424
PID: 29424  TASK: ffff880455e64ae0  CPU: 7   COMMAND: "tcsh"
 #0 [ffff880432017968] schedule at ffffffff8150db42
 #1 [ffff880432017a30] schedule_timeout at ffffffff8150ea05
 #2 [ffff880432017ae0] __down at ffffffff8150f922
 #3 [ffff880432017b30] down at ffffffff8109ca61
 #4 [ffff880432017b60] mvfs_mythread at ffffffffa05a8bd6 [mvfs]
 #5 [ffff880432017c00] mvfs_linux_enter_fs at ffffffffa05d2b16 [mvfs]
 #6 [ffff880432017c10] mvfs_enter_fs at ffffffffa05a81f4 [mvfs]
 #7 [ffff880432017c30] mvfs_get_thread_ptr at ffffffffa05d0ebd [mvfs]
 #8 [ffff880432017c40] mdki_linux_init_call_data at ffffffffa05dcc54 [mvfs]
 #9 [ffff880432017c60] vnode_iop_permission at ffffffffa05d46ec [mvfs]
#10 [ffff880432017ca0] __link_path_walk at ffffffff8119061d
#11 [ffff880432017d60] path_walk at ffffffff8119182a
#12 [ffff880432017da0] do_path_lookup at ffffffff811919fb
#13 [ffff880432017dd0] do_filp_open at ffffffff8119293b
#14 [ffff880432017f20] do_sys_open at ffffffff8117df59
#15 [ffff880432017f70] sys_open at ffffffff8117e070
#16 [ffff880432017f80] system_call_fastpath at ffffffff8100b072
    RIP: 00000036faedb400  RSP: 00007fffaa7dd798  RFLAGS: 00010202
    RAX: 0000000000000002  RBX: ffffffff8100b072  RCX: 000000000000001a
    RDX: 0000000000000001  RSI: 0000000000090800  RDI: 0000000001aa7100
    RBP: 0000000000000002   R8: 0000000000000000   R9: 00007fffaa7dd768
    R10: 0000000000008028  R11: 0000000000000246  R12: ffffffff8117e070
    R13: ffff880432017f78  R14: 0000000000000002  R15: 0000000000000001
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.