Is it safe to delete the kahadb index file db.data in order to recover the kahadb index from the journal?

Solution Unverified - Updated -

Environment

  • Fuse Message Broker 5.8 and lower
  • durable topic subscribers
  • kahadb

Issue

  • Is it safe to delete the kahadb index file db.data in order to recover the kahadb index from the journal?
  • Can we recover messages from Corrupted kahaDB ?
  • We are using the Red Hat supplied version of A-MQ but we do not use Karaf or Fuse. A-MQ only. We experienced corruption in our kahadb during startup and AMQ would not start up. We have both checksumJournalFiles="true" and checkForCorruptJournalFiles="true" set in the activemq.xml. After doing some research, We attempted a restart after adding ignoreMissingJournalfiles="true" and we still could not start up.
  • The only way to start AMQ was to create a new kahadb. Is there a way to recover the messages from the former kahadb? We have quite a few messages that we do not want to lose.

Resolution

  • Version A-MQ 6.0 has additional corruption recovery features, which users could also use to recover corrupt data files from an earlier release.

  • There is a configuration setting, "checkForCorruptJournalFiles" that should be enabled:

<kahaDB directory="activemq-data"
          journalMaxFileLength="32mb"
          checksumJournalFiles="true"
          checkForCorruptJournalFiles="true"
          />

Note: AMQ 6.0 by default is hosted inside a Karaf container, however an old-style distribution is available in the 'extras' folder.

  • In case of a corrupted kahadb:

1) backup your store as it is
2) Below:

<broker>/data/kahadb

Delete db.data
Delete db.redo

3) Restart the broker.

  • The broker will then rebuild the index by replaying the kahadb journal files.

  • This is generally safe to do, However users need to be careful in case of using durable topic subscribers, see note [1] below.

[1] The recovery process may not be able to recover inactive durable topic subscriptions.
This is because the kahadb cleanup task will not consider any active subscription entries in the journal files when marking journal files for deletion.

E.g. If a durable sub info was written to e.g. the journal file db-1.log but kahadb has already rolled over to writing to db-2.log, the cleanup task may delete db1.log (in case all msgs in db1.log got acked). The durable sub however is still alive.

When stopping the broker this durable sub info is still present in the index file and will be restored at broker restart.
If however the index file gets deleted in order to enforce a recovery of the index from the journal, then the broker has lost the information about this durable sub.

The broker is therefore not able to recover its state fully from the journal files.

If the durable subscriber remains inactive (i.e. does not reconnect to broker immediately after broker restart), it may miss messages as the broker has no knowledge of this durable sub.

AMQ-4212 got raised to capture this problem.

Diagnostic Steps

  • A corrupted KahaDB can identified in several ways but typically your log file will have errors such as "Failed to fill batch error" or a "NullPointerException". Stack traces will usually contain classes like " org.apache.kahadb.index.BTreeIndex".

  • Some typically scenarios that may lead to corrupted store are an "NullPointerException" mount dropping suddenly or disk where your store is located runs out of space. The broker can be easily in mid-write at this point.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

2 Comments

Bug AMQ-4212 is fixed in ActiveMQ 5.9.0 and the fix will be in JBoss A-MQ 6.1 onwards. The fix does additional tracking of durable subscriptions so we can move the subscription info along into the latest journal file as the old journal files become available for gc.

Also see KCS 289733.