Ghost message causes KahaDB log cleanup problems
Environment
- Fuse MQ Enterprise
- 7.1
Issue
-
After cleaning up all messages from all queues and having no client connected, the KahaDB archive logs are not cleaned-up properly.
- After restarting the broker.The kahadb logs still shows:
2013-04-25 15:15:37,891 [xxxxx Worker] DEBUG MessageDatabase - Checkpoint started.
2013-04-25 15:15:37,972 [xxxxx Worker] TRACE MessageDatabase - Last update: 610:25312952, full gc candidates set: [419, 449, 450, 516, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610]
2013-04-25 15:15:37,972 [xxxxx Worker] TRACE MessageDatabase - Last update: 610:25312952, full gc candidates set: [419, 449, 450, 516, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610]
2013-04-25 15:15:37,973 [xxxxx Worker] TRACE MessageDatabase - gc candidates after first tx:419:33552813, []
2013-04-25 15:15:37,973 [xxxxx Worker] TRACE MessageDatabase - gc candidates after first tx:419:33552813, []
2013-04-25 15:15:37,973 [xxxxx Worker] TRACE MessageDatabase - gc candidates: []
2013-04-25 15:15:37,973 [xxxxx Worker] TRACE MessageDatabase - gc candidates: []
2013-04-25 15:15:37,973 [xxxxx Worker] DEBUG MessageDatabase - Checkpoint done.
-
The db-419.log which is the oldest one is preventing the kahadb GC from cleaning-up.
-
The startup logs seems to only consider later logs during startup recovery:
2013-04-25 15:19:02,801 | INFO | KahaDB is version 4
2013-04-25 15:19:02,866 | INFO | Recovering from the journal ...
2013-04-25 15:19:11,075 | INFO | @516:5866659, 100000 entries recovered ..
2013-04-25 15:19:16,649 | INFO | @598:10815505, 200000 entries recovered ..
2013-04-25 15:19:23,432 | INFO | @600:8502887, 300000 entries recovered ..
2013-04-25 15:19:31,338 | INFO | @602:22616892, 400000 entries recovered ..
2013-04-25 15:19:41,105 | INFO | @604:24684253, 500000 entries recovered ..
2013-04-25 15:19:53,310 | INFO | @606:14016551, 600000 entries recovered ..
2013-04-25 15:20:00,858 | INFO | Recovery replayed 661045 operations from the journal in 58.038 seconds.
Resolution
- This AMQ Bug AMQ-4548
- The fix will provided as a part of Fuse MQ Enterprise 7.1.0 mega patch.
Root Cause
- When doing a recovery its possible for KahaDB to recover XA transactions that were previously prepared and committed and leave them in a transient in-flight state. This causes the GC cycle to hold old log files when they could otherwise be deleted. These in-flight transactions can be discarded once recovery compeletes.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments