ActiveMQ never removes kahadb.log files; no pending messages visible through the JSP interface; how to detect a criminal?
we have ActiveMQ 5.7.0 running on CentOS. About fifty Java programs write and consume queues, about half from the local host, and the rest are scattered across remote clients, with most of them having one consumer per process, but four with 32.
A few days ago ActiveMQ stopped deleting .log files from data / kahadb. On restart, ActiveMQ removes everything from kahadb and then removes nothing while running.
No pending (that is, queued but not deleted) messages are visible through the web interface at [host]: 8161 / admin / queues.jsp. DLQ is empty and removing it does not affect the issue. (Also gleaned from the interface: all connections are active, and none is slow, no subscribers, no bridges, no scheduler.)
Following http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html I got this:
| TRACE | Last update: 236: 28401525, full set of gc candidates: [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 <[snip]>, 236 | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Log Work Controller 2014-09-11 08: 50: 03,384 | TRACE | candidates for gc after the first tx: 89: 10178611,  | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Log Work Controller 2014-09-11 08: 50: 03,384 | TRACE | gc candidates:  | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Log Checkpoint worker
where db-89.log is the first log file generated after restarting ActiveMQ and db-236.log is the newest existing file.
There are no other errors or warnings in the ActiveMQ log. There are no consistently flagged exceptions for programs using queues. My corporate programs on localhost release transactions according to their logs. If the third party program doesn't free the transaction, then I don't know how to find it.
With all this in mind, how can I identify or narrow down the possible cause of the problem? What additional information would be helpful?
As an additional restriction, access to client computers and their programs is a business issue. I have no accounts and administrators are located in different countries, which slows down communication. If I have to contact them, I would like to provide them with all the information possible.
source to share
We solved the problem by researching the ActiveMQ source code to understand the snippet:
gc candidates after first tx: 89: 10178611
Turns off, 89 is the log file name (db-89.log) and 10178611 is the offset in the file. So we dumped the log file:
xxd -g1 db-89.log | less
then we did a text search for our offset (converted to hex). The dump had a human-readable name for the queue with the hanging transaction and the server it came from.
I don't have access to the problematic server or code, but the admin told me unofficially that their developers "fixed" the closing of the transaction, no matter what the fix might be. This fixed the problem.
source to share
I have a similar scenario and it turned out that messages were stuck in the DLQ that nobody cared about them ...: S
Just a hint: check out DLQ!
I repeat your solution, I tried to do the same, my line looks like this:
2014-11-19 12: 01: 33,964 [eckpoint Worker] TRACE MessageDatabase - gc candidates after tx range: [496: 28242122, 496: 28242122], [45, 52, 53, 54, ...]
So, as per your instructions, I opened the db-496.log file with a hex editor (Notepad ++ with a Hex viewer plugin) and looked for 28242122 (offset), but it's not in the file. What am I doing wrong?
source to share