How to debug recurring rpm database corruption

Solution Verified - Updated March 14 2025 at 8:38 AM -

Environment

Red Hat Enterprise Linux (RHEL) 6
Red Hat Enterprise Linux (RHEL) 7
Red Hat Enterprise Linux (RHEL) 8

Issue

Recurring rpm database corruption
Stale lock files are being left behind in /var/lib/rpm

rpm commands are failing with messages similar to:

# rpm -qa
error: rpmdb: BDB0113 Thread/process 5691/140201285396544 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
error: rpmdb: BDB0113 Thread/process 5691/140201285396544 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm

Resolution

Disclaimer: This utility is meant to be used at the request of support and is not officially supported by Red Hat. Any issues seen with this utility will be handled, if possible, by one of the few engineers associated with it's creation and not by Bugzilla

The issue can be temporarily resolved by stopping any processes using files under the /var/lib/rpm directory and removing these files, or by a reboot which removes these files as well.
- Check processes that are using /var/lib/rpm:
  Raw
```
# fuser -v /var/lib/rpm
```
- Stop any services or kill processes reported by fuser -v /var/lib/rpm.
- Afterwards, clear the lock files:
  Raw
```
# rm /var/lib/rpm/__db*
```

WARNING: This utility doesn't currently work when system is booted in FIPS mode (`fips=1` on kernel command line) nor when system is booting in UEFI with Secure Boot enabled.

Configure the repository

RHEL 6:

# curl https://people.redhat.com/kwalker/repos/rpm-deathwatch/rhel6/rpm-deathwatch-rhel-6.repo -o /etc/yum.repos.d/rpm-deathwatch.repo

RHEL 7:

# curl https://people.redhat.com/kwalker/repos/rpm-deathwatch/rhel7/rpm-deathwatch-rhel-7.repo -o /etc/yum.repos.d/rpm-deathwatch.repo

RHEL 8:

# curl https://people.redhat.com/kwalker/repos/rpm-deathwatch/rhel8/rpm-deathwatch-rhel-8.repo -o /etc/yum.repos.d/rpm-deathwatch.repo

Install required packages

You will need systemtap installed, as well as the yum-utils, kernel-devel, kernel-headers, and kernel-debuginfo packages for your running kernel.


# yum install -y yum-utils kernel-{devel,headers,debuginfo}-$(uname -r) systemtap && debuginfo-install -y kernel
# yum install rpm-deathwatch

Note If not enabled, you should add the --enablerepo option for the systems respective debug repo --enablerepo=rhel-8-for-x86_64-baseos-debug-rpms
Note If you are using local repositories, you will need to sync the rhel-6-server-debug-rpms, or rhel-7-server-debug-rpms, or rhel-8-for-x86_64-baseos-debug-rpms repository or the rpms can be manually downloaded from the Package Search

Starting the service

Notes

The first time rpm-deathwatch is run, the systemtap module will take up to a few minutes to build the module.
If the system is going to reboot before the issue reoccurs, you may want to enable the service
The service must be started BEFORE the issue occurs in order to log the signals sent to rpm, yum or dnf.

RHEL 6

To start the service:

# /etc/rc.d/init.d/rpm-deathwatch start
Starting rpm-deathwatch

Afterwards, check the status of the service:

# /etc/rc.d/init.d/rpm-deathwatch status 
Service running as 15070

To enable it to start during the system startup:

# chkconfig --add rpm-deathwatch
# chkconfig rpm-deathwatch on

RHEL 7 and RHEL 8

To start the service:
Raw
```
# systemctl start rpm-deathwatch
```
To start the service and enable it to start during the system startup:
Raw
```
# systemctl enable --now rpm-deathwatch
```

Afterwards, check the status of the service:

# systemctl status rpm-deathwatch
● rpm-deathwatch.service - Systemtap module to monitor for termination signals sent to processes accessing the RPM database
  Loaded: loaded (/usr/lib/systemd/system/rpm-deathwatch.service; enabled; vendor preset: disabled)
  Active: active (running) since Wed 2020-08-19 08:19:10 EDT; 4s ago
Main PID: 17824 (rpm-deathwatch)
  CGroup: /system.slice/rpm-deathwatch.service
          ├─17824 /usr/bin/python2 /usr/bin/rpm-deathwatch --name /var/log/rpm-deathwatch
          └─17834 /usr/bin/stap --suppress-handler-errors -DMAXSTRINGLEN=4096 -p4 -mrpm_deathwatch rpm_deathwatch.stp

Aug 19 08:19:10 r7 systemd[1]: Started Systemtap module to monitor for termination signals sent to processes accessing the RPM database.
Aug 19 08:19:10 r7 rpm-deathwatch[17824]: INFO:rpm-deathwatch:Checking the environment
Aug 19 08:19:11 r7 rpm-deathwatch[17824]: INFO:rpm-deathwatch:Building the systemtap module to report termination signals
Aug 19 08:19:35 r7 rpm-deathwatch[17824]: INFO:rpm-deathwatch:Starting monitoring
Aug 19 08:19:35 r7 rpm-deathwatch[17824]: INFO:rpm-deathwatch:        Opened: /var/log/rpm-deathwatch-08-19-2020-08:19:35
Aug 19 08:19:35 r7 rpm-deathwatch[17824]: INFO:rpm-deathwatch:Writing output to <__main__.files instance at 0x7f70d4a5c560>
Aug 19 08:19:35 r7 rpm-deathwatch[17824]: INFO:rpm-deathwatch:Monitoring 18147

In case you suspect the corruption occurs during the boot, you may amend the service to make sure it starts early, before any other service, through creating the drop-in below:

# mkdir -p /etc/systemd/system/rpm-deathwatch.service.d
# cat > /etc/systemd/system/rpm-deathwatch.service.d/early_startup.conf << EOF
[Unit]
Before=local-fs-pre.target shutdown.target
Wants=local-fs-pre.target
DefaultDependencies=no
Conflicts=shutdown.target

[Service]
ExecStartPost=/bin/timeout 60s /bin/sh -c 'while ! lsmod | grep -q rpm_deathwatch; do echo "Module is not yet loaded"; sleep 5; done; echo "Module is now loaded"'
EOF
# systemctl daemon-reload

Collecting the data

Once the issue occurs while the service is running, please attach rpm-deathwatch-$(date +%m-%d).tar.bz2, generated by the below command, to a new or existing Red Hat Support Case
Raw
```
# tar cjf rpm-deathwatch-$(date +%m-%d).tar.bz2 /var/log/rpm-deathwatch*
```

Root Cause

If rpm, yum or dnf does not exit cleanly the lock files are left behind. The files (__db001 – __db005) are left behind in /var/lib/rpm. We can see the pid that left the files with:
Raw
```
# db_stat -C l -h /var/lib/rpm
```
The problem tends to be that we have no logs or audit configure for what actually killed the process. The most common reason being an automation tool timed out and abruptly ends the process without letting rpm clear the lock files.
The issue can be temporarily resolved by stopping any processes using files under the /var/lib/rpm directory and removing these files, or by a reboot which removes these files as well.
- Check processes that are using /var/lib/rpm:
  Raw
```
# fuser -v /var/lib/rpm
```
- Stop any services or kill processes reported by fuser -v /var/lib/rpm.
- Afterwards, clear the lock files:
  Raw
```
# rm /var/lib/rpm/__db*
```
Microsoft Defender for Endpoint on Linux (MDATP) is known to cause the issue:
- Check if it is installed:
  Raw
```
# rpm -q mdatp
mdatp-101.98.64-1.x86_64
```
If the issue is reoccurring that is when automation becomes the most likely cause.

Diagnostic Steps

Example Logs

We can see the "Code 0" means that the commands are completing succesfully. This means the rpm-deathwatch service was started while rpm was working, which is good. We need rpm-deathwatch to run before the issue so it can capture the SIGKILL.

Thu Dec  3 23:56:57 2020 EST: kprocess.exit: rpm(pid:31155) - Code   0 - "rpm -qa"
Thu Dec  3 23:56:58 2020 EST: kprocess.exit: yum(pid:31171) - Code   0 - "/usr/bin/python /usr/bin/yum -y update
Thu Dec  3 23:56:59 2020 EST: kprocess.exit: yum(pid:31171) - Code   0 - "/usr/bin/python /usr/bin/yum -y update

Now we have a SIGKILL. We define the process killed (yum), the pid of yum (31171), output the Process tree of who sent it, and the command that was killed.

Fri Dec  4 00:00:17 2020 EST: sig.send: SIGKILL was sent to yum (pid:31171) by uid:0 using signal_generate
Fri Dec  4 00:00:17 2020 EST: sig.send: Process tree:
 ->systemd[1] - <None found>
  ->BESClient[3563] - <None found>
Fri Dec  4 00:00:17 2020 EST: kprocess.exit: yum(pid:31171) - Code   9 - "/usr/bin/python /usr/bin/yum -y update

From this, we can determine that BESClient sent the SIGKILL which killed yum (pid 31171) which was running a "yum update"

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Ansible.com

Red Hat Ecosystem Catalog

Red Hat Hybrid Cloud Console

Red Hat Store

Red Hat Summit and AnsibleFest

How to debug recurring rpm database corruption

Environment

Issue

Resolution

WARNING: This utility doesn't currently work when system is booted in FIPS mode (`fips=1` on kernel command line) nor when system is booting in UEFI with Secure Boot enabled.

Configure the repository

Install required packages

Starting the service

RHEL 6

RHEL 7 and RHEL 8

Collecting the data

Root Cause

Diagnostic Steps

Example Logs

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

WARNING: This utility doesn't currently work when system is booted in FIPS mode (fips=1 on kernel command line) nor when system is booting in UEFI with Secure Boot enabled.

Configure the repository

Install required packages

Starting the service

RHEL 6

RHEL 7 and RHEL 8

Collecting the data

Root Cause

Diagnostic Steps

Example Logs

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links

WARNING: This utility doesn't currently work when system is booted in FIPS mode (`fips=1` on kernel command line) nor when system is booting in UEFI with Secure Boot enabled.