What is the consequence of a corrupted kahadb index ?

Solution Verified - Updated -

Environment

  • Fuse Message Broker
    • 5.X
  • Fuse MQ Enterprise
    • 7.X

Issue

  • I understand that kahadb corruption issues usually show up during startup or shutdown. What are the consequences of running such a corrupted Active MQ instance?

Resolution

  • The persistent store should never run corrupted. Most of the the corruption we know about is related to bugs and should not happen. In some situations, such as an iterator in the index stopping short, all messages may not be delivered. We would consider something like this a low impact as the messages will be re-dispatched when the corruption is resolved.

  • General corruption would come from some other process say overwriting a journal file or index. In some of these specific bugs we know the impact. Often times though, we don't know until a restart and best we can do is drop a journal file.

Diagnostic Steps

If you encounter a NPE or page file not found or any other type of kahaDB corruption, you should do the following:

  1. Backup the corrupted database
  2. Later versions of Fuse Message Broker try to self-heal, however you may have to force recovery by deleting the db.data file in the ./data/kahadb directory. Please see Recovery Guide for more information. After deleting the file, the broker will not find the index and automatically rebuild it. It's important to note that the size of your data directory is directly proportional to how long it will take for the broker to start. It has to replay back all the messages in your store so the more messages, the longer the time.
  3. Try to determine the cause of the issue? Was the broker shutdown ungracefully? If this occurred during a write operation, there is nothing more to do then rebuild the index.
  4. If there appears to be no logical explanation for the corruption, open a ticket in the Red Hat portal adding the zipped kahadb store if possible and the debug logs including the stack trace and software versions you are using. We'll match it as best we can to any known issue or work to isolate the root cause.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

3 Comments

Also see KCS 276323 and 284343.

Link to recovery guide is broken.

Thanks for reporting the broken link. I just updated with the most recent link.