CPU utilization of the rpm process stays at 100 % on RHEL 6

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 6
  • db4-4.7.25-16.el6

Issue

  • Multiple rpm processes hang and do not complete.
  • Some of the stuck rpm processes show 100% CPU utilization.

Resolution

To resolve the situation once it occurs, perform the following steps:

  • Kill all stuck rpm processes with the following command:
        # killall -9 rpm
  • Rebuild the rpm database with the following command. This step is recommended, because the problem is likely to have caused corruption of the rpm database which could cause further problems if not corrected.
        # rpm --rebuilddb
  • Installing fixed packages:
    • The bug is fixed in current upstream Berkeley DB, and in Fedora.
    • for RHEL6.2.z: Upgrade the db4-* to 4.7.25-16.el6_2.1 (released with RHBA-2013-1443) or later to fix the issue.
    • for RHEL6.4.z: Upgrade the db4-* to 4.7.25-18.el6_4 (released with RHBA-2013-1258) or later to fix the issue.

Root Cause

This is caused by a bug in the Berkeley DB library (db4 package).

Early during opening/initialization of the database, access is made before initializing the database locking system. This causes the access to incorrectly run with locking disabled. Thus, another process which correctly holds the database lock may be accessing the database at the same time, simultaneously updating shared data structures. This can cause database corruption, leading to an infinite loop.

Because one process is now stuck in an infinite loop while holding the database lock, subsequent rpm instances will block indefinitely, waiting for the lock to be released.

Diagnostic Steps

  • Debug the stuck rpm processes by any of the following means:

    • Produce a vmcore, then extract corefiles for the stuck rpm processes using the gcore subcommand of crash.
    • Cause the individual rpm processes to dump core by killing them with SIGQUIT, capturing the generated core files using abrt or another tool.
    • Attach gdb to the running, stuck rpm processes using gdb /bin/rpm -p <pid>.
  • At least one of the stuck rpm processes will have a backtrace similar to that below. The crucial part here is frame 2, showing the ENV_ENTER() call in __env_open().

(gdb) bt
#0  __env_alloc (infop=<value optimized out>, len=120, retp=0x7fff69314bb8) at ../../env/env_alloc.c:266
#1  0x0000003008cfd44e in __env_set_state (env=0x23002b0, ipp=0x7fff69314c20, state=THREAD_ACTIVE)
    at ../../env/env_failchk.c:368
#2  0x0000003008d00ecd in __env_open (dbenv=0x22ffc10, db_home=<value optimized out>, flags=352, 
    mode=<value optimized out>) at ../../env/env_open.c:346
#3  0x0000003007c16a0e in db_init (rpmdb=0x22ff340, rpmtag=<value optimized out>, dbip=0x7fff69314d18)
    at backend/db3.c:181
#4  db3open (rpmdb=0x22ff340, rpmtag=<value optimized out>, dbip=0x7fff69314d18) at backend/db3.c:620
#5  0x0000003007c1cfa3 in dbiOpen (db=0x22ff340, rpmtag=0, flags=<value optimized out>) at rpmdb.c:237
#6  0x0000003007c1d4a5 in openDatabase (prefix=<value optimized out>, dbpath=<value optimized out>, 
    _dbapi=<value optimized out>, dbp=0x22fe938, mode=<value optimized out>, perms=<value optimized out>, flags=0)
    at rpmdb.c:994
#7  0x0000003007c1d62c in rpmdbOpen (prefix=0x22fecb0 "/", dbp=0x22fe938, mode=0, perms=420) at rpmdb.c:1052
#8  0x0000003007c48fab in rpmtsOpenDB (ts=0x22fe8f0, dbmode=0) at rpmts.c:82
#9  0x0000003007c49273 in rpmtsInitIterator (ts=0x22fe8f0, rpmtag=RPMTAG_NAME, keyp=<value optimized out>, keylen=0)
    at rpmts.c:150
#10 0x0000003007c49431 in loadKeyringFromDB (ts=0x22fe8f0) at rpmts.c:290
#11 loadKeyring (ts=0x22fe8f0) at rpmts.c:325
#12 0x0000003007c49257 in rpmtsInitIterator (ts=0x22fe8f0, rpmtag=2, keyp=<value optimized out>, keylen=0)
    at rpmts.c:148
#13 0x0000003007c37275 in rpmQueryVerify (qva=0x3007e69bc0, ts=0x22fe8f0, arg=0x22eb590 "FJSVisas") at query.c:511
#14 0x0000003007c3799f in rpmcliArgIter (ts=0x22fe8f0, qva=0x3007e69bc0, argv=<value optimized out>) at query.c:577
#15 0x0000003007c37b26 in rpmcliQuery (ts=0x22fe8f0, qva=0x3007e69bc0, argv=0x22bf480) at query.c:614
#16 0x00000000004029e1 in main (argc=3, argv=<value optimized out>) at rpmqv.c:758
  • The st_search loop counter in __env_alloc() will contain a very large value:
(gdb) frame 0
#0  __env_alloc (infop=<value optimized out>, len=120, retp=0x7fff69314bb8) at ../../env/env_alloc.c:266
266             STAT((++st_search));
(gdb) p st_search
$2 = 2990050065
  • The env pointer in __env_open() and other functions will have a NULL mutex_handle field. This is the "smoking gun" indicating that db4 is incorrectly accessing the shared data structures with disabled locking.
(gdb) frame 2
#2  0x0000003008d00ecd in __env_open (dbenv=0x22ffc10, db_home=<value optimized out>, flags=352, 
    mode=<value optimized out>) at ../../env/env_open.c:346
346     ENV_ENTER(env, ip);
(gdb) p env->mutex_handle
$6 = (DB_MUTEXMGR *) 0x0

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.