QPIDD failed to read journal - JERR_RCVM_READ

Solution Verified - Updated -

Environment

  • Red Hat Satellite 6
  • qpid-cpp-server-linearstore 0.30 or older
    • for newer Satellite/qpidd versions, please follow this solution

Issue

  • various tasks timeout
    • most of pulp related (capsule sync, content view promotion,..)
    • package/errata install via Satellite fails
    • new Content Host registration fails
  • all that symptoms pointing out to qpidd that is refusing to start. With error message in the log JERR_RCVM_READ: Read error: no or insufficient data to read

Resolution

For newer Satellite/qpidd versions, please follow this solution.

Set the qpid_q and cor_journal variables for all the following steps. The path and the name of the corrupted file can be found in the /var/log/messages log.

Example of qpidd error message:

Dec  9 10:12:17 mysat007 qpidd: 2015-12-09 10:12:17 [Broker] critical Unexpected error: Queue pulp.agent.dd4587dd-5a1f-43d5-9e5c-beb02b183363: recoverQueues() failed: jexception 0x0902 RecoveryManager::readJournalFileHeader() threw JERR_RCVM_READ: Read error: no or insufficient data to read (File=/var/lib/qpidd/.qpidd/qls/jrnl/pulp.agent.dd4587dd-5a1f-43d5-9e5c-beb02b183363/f984cf97-01c4-4ca2-a796-3e93063738a4.jrnl; attempted_read_size=4096; actual_read_size=0) (/builddir/build/BUILD/qpid-cpp-0.30/src/qpid/linearstore/MessageStoreImpl.cpp:771)

Setting of variables based on the log message above.

qpid_q="/var/lib/qpidd/.qpidd/qls/jrnl/pulp.agent.dd4587dd-5a1f-43d5-9e5c-beb02b183363" # The path to the corrupted journal file 
cor_journal="f984cf97-01c4-4ca2-a796-3e93063738a4.jrnl" # The name of the corrupted journal file

Steps what need to be taken to replace the corrupted journal file:

  • Get a new journal file and copy it to your server /tmp directory. Either use the attached one (ensure other journal files *.jrnl under /var/lib/qpidd are of the same length), or on some other machine (e.g. a Capsule) running qpidd of the same version, generate it manually:
qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 add queue _TEST_QUEUE --durable      
cp ${qpid_q}/../_TEST_QUEUE/*.jrnl /tmp/fresh.jrnl
qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 del queue _TEST_QUEUE --durable  

If there is now way to generate a new journal file and you need to use the attached one, make sure it is generated for the same version of qpid cpp library you have on your system. Use the following command to get the version on your system.

    rpm -qi qpid-cpp-server | grep Version | awk '{print $3}'

In case there isn't any journal file with corresponding version attached, contact the Red Hat support

  • Go to the affected journal directory and make sure you are pointing to the correct file:
cd ${qpid_q}
ls -l | grep ${cor_journal}

You should see something like this:

-rw-r-----. 1 qpidd qpidd 0 Oct  6 10:20 6180f0d3-b292-4c09-830d-b27883b1f775.jrnl
  • Stop qpidd process during the repair steps:

    service qpidd stop

  • Replace the corrupted journal file and correct the rights:

rm -f ${cor_journal}
cp /tmp/fresh.jrnl ${cor_journal}
chown qpidd.qpidd ${cor_journal}
chmod 640 ${cor_journal}
  • In case there is SELINUX enabled, run also:
restorecon -Rv ../
  • Now try to restart the satellite services:
katello-service restart

(starting qpidd service shall be sufficient, but let be sure all dependant services catch up the broker start properly)

  • Check if the satellite services are up and running, to make sure the satellite is healthy:
katello-service status

NOTE: When there are more than one corrupted journal file, the /var/log/message will contain only the first found corrupted journal file.

Use this command to check if there are any other corrupted files:

find ${qpid_q}/.. -type f -iname "*.jrnl" -exec du -bc {} + | awk '{if($1 < 4096) print $0}'

And apply the same steps with the same (generic) /tmp/fresh.jrnl file for all of them.

Root Cause

The journal file was somehow corrupted (possibly no free disk space when creating it) and needs to be replaced. Generating an empty journal file (or used the attached one generated the same way), one gets a generic journal file that can be repeatedly used for the replacement.

Diagnostic Steps

To check why the qpidd wont start, check the /var/log/messages as this is the log, where the qpidd puts all messages by default.

grep JERR_RCVM_READ /var/log/messages

Attachments

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments