WildFly - connection timeout and HornetQ

In a high volume installation of WildFly 8.2.1-Final, we experience degraded performance for about a week until a reboot is required, as well as more frequent failures in sending JMS messages due to transactions in ABORT_ONLY state:

Caused by: javax.ejb.EJBException: javax.jms.JMSException: Could not create a session: IJ000457: Unchecked throwable in managedConnectionReconnected()
Caused by: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3915e409[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@7d6bcd06 connection handles=0 lastUse=1494579251303 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.PoolByCri@691085ec mcp=SemaphoreArrayListManagedConnectionPool@66a127f5[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@56a9bb4[xaResource=org.hornetq.ra.HornetQRAXAResource@4ca920eb pad=false overrideRmValue=null productName=HornetQ productVersion=2.0 jndiName=java:/JmsXA] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:780)
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:518)
        at org.hornetq.ra.HornetQRASessionFactoryImpl.allocateConnection(HornetQRASessionFactoryImpl.java:948)
        ... 261 more
Caused by: javax.resource.ResourceException: IJ000461: Could not enlist in transaction on entering meta-aware object
        at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:551)
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:775)
        ... 263 more
Caused by: javax.transaction.SystemException: IJ000356: Failed to enlist: java.lang.Throwable: Unabled to enlist resource, see the previous warnings. tx=TransactionImple < ac, BasicAction: 0:ffff0a0001c3:682851c6:5900737b:dd09939 status: ActionStatus.ABORT_ONLY >
        at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener$TransactionSynchronization.checkEnlisted(TxConnectionListener.java:848)
        at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener.enlist(TxConnectionListener.java:383)
        at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:544)
        ... 264 more

      

The reason the transaction is in ABORT_ONLY state is a warning previously logged:

2017-05-30 11:33:15,977 WARN  [com.arjuna.ats.jta] (Thread-123) ARJUNA016086: TransactionImple.enlistResource setTransactionTimeout on XAResource < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff0a0001c3:-7a698606:591fe1ff:82cea51, node_name=1, branch_uid=0:ffff0a0001c3:-7a698606:591fe1ff:830e516, subordinatenodename=null, eis_name=java:/JmsXA > threw: XAException.XAER_RMERR: javax.transaction.xa.XAException
Caused by: HornetQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 51]
       at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:390)
       at org.hornetq.core.client.impl.ClientSessionImpl.start(ClientSessionImpl.java:1901)

      

Some information:

  • The two types of packets receiving timeout are 51 ( SESS_XA_START

    ) and 63 ( SESS_XA_SET_TIMEOUT

    )
  • The only other participant in the transaction is the Oracle XA connection
  • There are 0 queues and 1 (long-lived) topic from which messages are consumed (the number of messages is equal to or slightly above 0)

Messages are sent from a singleton EJB and the corresponding business method is annotated with @Lock(LockType.READ)

so it can be called at the same time. The EJB injects java:/JmsXA

TopicConnectionFactory

and Topic

and creates new ones TopicConnection

, Session

and MessageProducer

on every method call (every message sent) and takes care of closing all of them in the block finally {}

.

Since timeouts happen with the sending of packets 51 ( SESS_XA_START

) and 63 ( SESS_XA_SET_TIMEOUT

), it seems like starting up the session is slow or hangs.

Any idea what could be causing it?

+3


source to share


2 answers


Since we saw that there were several separate connections involved in thousands of these errors, we got the suspicion that the connections somehow break and remain in the pool, so these broken connections are reused, causing sequential errors.

Post # 5 on this similar issue says:

The connections from the JmsXA factory connection do indeed come from the pool. However, each of these connections has a listener hooked up to it so that when a connection level error occurs, it should be kicked from the pool automatically by the JCA implementation. The strangest thing about your problem is that everything is local, so there shouldn't be any broken connections or any connection problems either.

To ensure that these broken connections are removed from the pool, we entered a call cl.getManagedConnection().destroy()

on line 799 in AbstractConnectionManager#reconnectManagedConnection(ConnectionListener)

from ironjacamar-core-impl 1.1.9, replaced the JAR, and let the system run for a few weeks now.



It works stably so far, doesn't slow down or even hang.

Some entries added to the same method log the pool statistics and we can see that 100 connections have been destroyed during this time, but more importantly, each of the original errors uses a separate connection, so these broken connections are not reused anymore.

The initial error that caused the connection to drop in the first place is unclear. From the logging it appears that the same JMS connection is being used by two threads at the same time, messing up the connection causing timeouts and ultimately leaving the connection in a bad state.

Another hunch is that this is a problem in Ironjacamar 1.1.9 and has been fixed for a long time and does not occur in later versions of WildFly.

+1


source


Experimenting with the same issues with Ironjacamar 1.0.9 on jboss as 7.1.1 Final with XA transaction on Mysql and JMS resource using JCA PooledConnectionFactory (and fixed it the same way as below)

Below is the error stream:

  • it starts with Mysql timeout (JTA coordinator timeout is 100s)

    • ARJUNA012095: Interrupt action id 0: ffff0af02d68: -63fd0719: 5ab027d7: 5afb5 is called when multiple threads are involved in it.
    • TransactionReaper :: check successfuly marked TX 0: ffff0af02d68: -63fd0719: 5ab027d7: 5afb5 rollback only
    • JDBCExceptionReporter - lock timed out; try restarting the transaction.
  • Then the first XA error appeared (on a different thread)

    • XA operation failed Failed to start, session is already running in transaction XidImpl xxx
  • And finally, the same error is repeated over and over again each time in a different thread more and more often, referring to the SAME transaction ID

    • XA operation failed Failed to start, session is already running in transaction XidImpl xxx

At the end, 100% of the JMS post operations fail with the same error until we reboot the server or the connection pool is dropped and restarted with scatch



Below pach code (as per Torsten Römer answer)

protected void reconnectManagedConnection(ConnectionListener cl) throws ResourceException
   {
      try
      {
         managedConnectionReconnected(cl);
      }
      catch (Throwable t)
      {
         disconnectManagedConnection(cl);
         // begin patch
         try
         {
            log.warn("JCA patch : forcing destroy on disconnected managed connection ");
            cl.getManagedConnection().destroy();
         }
         catch (ResourceException e)
         {
            log.warn("JCA patch : ResourceException thrown while forcing connection destroy : " + e.getMessage());
         }
         // end patch
         throw new ResourceException(bundle.uncheckedThrowableInManagedConnectionReconnected(cl), t);
      }
   }

      

And below the raw log (truncated) with a stream of errors

2018-03-20 05:08:04,408 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 in state  RUN
2018-03-20 05:08:04,411 [] [Transaction Reaper Worker 0] WARN com.arjuna.ats.arjuna - ARJUNA012095: Abort of action id 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 invoked while multiple threads active within it.
2018-03-20 05:08:04,411 [] [Transaction Reaper Worker 0] WARN com.arjuna.ats.arjuna - ARJUNA012108: CheckedAction::check - atomic action 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 aborting with 1 threads active!
2018-03-20 05:08:04,911 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 in state  CANCEL
2018-03-20 05:08:05,413 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 in state  CANCEL_INTERRUPTED
2018-03-20 05:08:05,414 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012120: TransactionReaper::check worker Thread[Transaction Reaper Worker 0,5,main] not responding to interrupt when cancelling TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 -- worker marked as zombie and TX scheduled for mark-as-rollback
2018-03-20 05:08:05,414 [] [Transaction Reaper] WARN com.arjuna.ats.arjuna - ARJUNA012110: TransactionReaper::check successfuly marked TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 as rollback only

[...]

2018-03-20 05:08:26,882 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-23] ERROR o.h.util.JDBCExceptionReporter - Lock wait timeout exceeded; try restarting transaction
2018-03-20 05:08:26,886 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-23] ERROR o.h.e.d.AbstractFlushingEventListener - Could not synchronize database state with session
org.hibernate.exception.GenericJDBCException: Could not execute JDBC batch update
        at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:126) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
        [...]
        Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1084) ~[mysql-connector-java-5.1.30.jar!/:na]
        [...]

[...]   

2018-03-20 05:08:32,886 [] [Transaction Reaper Worker 0] ERROR o.h.c.client.impl.ClientSessionImpl - Caught Exception
org.hornetq.api.core.HornetQException: Timed out waiting for response when sending packet 52
        at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:302) ~[hornetq-core-2.2.13.Final.jar!/:na]
        [...]
2018-03-20 05:08:32,887 [] [Transaction Reaper Worker 0] WARN com.arjuna.ats.arjuna - ARJUNA012091: Top-level abort of action 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 received TwoPhaseOutcome.FINISH_ERROR from com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord
2018-03-20 05:08:32,887 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-23] WARN com.arjuna.ats.arjuna - ARJUNA012077: Abort called on already aborted atomic action 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5
2018-03-20 05:08:32,889 [] [Transaction Reaper Worker 0] WARN com.arjuna.ats.arjuna - ARJUNA012113: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] missed interrupt when cancelling TX 0:ffff0af02d68:-63fd0719:5ab027d7:5afb5 -- exiting as zombie (zombie count decremented to 0)

[...]

2018-03-20 05:08:36,407 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-1] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:08:36,460 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-1] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521518916451 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        [...]/
        Caused by: javax.resource.ResourceException: IJ000461: Could not enlist in transaction on entering meta-aware object
        at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:474) ~[na:na]
        [...]
        Caused by: javax.transaction.SystemException: IJ000356: Failed to enlist: java.lang.Throwable: Unabled to enlist resource, see the previous warnings. tx=TransactionImple < ac, BasicAction: 0:ffff0af02d68:-63fd0719:5ab027d7:5b978 status: ActionStatus.ABORT_ONLY >
        [...]
        at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener$TransactionSynchronization.checkEnlisted(TxConnectionListener.java:552) ~[na:na]
        at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener.enlist(TxConnectionListener.java:282) ~[na:na]
        at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:467) ~[na:na]
        ... 164 common frames omitted
2018-03-20 05:08:44,602 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-25] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:08:44,663 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-25] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521518924643 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) ~[na:na]
[...]
2018-03-20 05:09:08,906 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-6] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:09:08,957 [req.remoteHost=10.240.45.100, tenant=sngxxx] [ajp--0_0_0_0_0_0_0_0-8009-6] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521518948948 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) ~[na:na]
[...]
2018-03-20 05:09:34,353 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-26] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:09:34,474 [req.remoteHost=10.240.45.100, tenant=graxxx] [ajp--0_0_0_0_0_0_0_0-8009-26] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521518974452 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) ~[na:na]
[...]
2018-03-20 05:14:49,036 [tenant=bloxxx] [ajp--0_0_0_0_0_0_0_0-8009-14] ERROR o.h.c.client.impl.ClientSessionImpl - XA operation failed Cannot start, session is already doing work in a transaction XidImpl (1278202944 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-77.75.0.0.0.7.0.0.0.0 formatID:131077 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.10.-16.45.104.-100.2.-8.-25.90.-80.39.-41.0.5.-81.-75.49 code:-6
2018-03-20 05:14:49,070 [tenant=bloxxx] [ajp--0_0_0_0_0_0_0_0-8009-14] ERROR o.h.ra.HornetQRASessionFactoryImpl - Could not create session
javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3c2cebaa[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@903abac connection handles=0 lastUse=1521519289064 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4ffbadd3 pool internal context=SemaphoreArrayListManagedConnectionPool@5ec1a591[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@14680555[xaResource=org.hornetq.ra.HornetQRAXAResource@2f78b5a pad=false overrideRmValue=false productName=HornetQ productVersion=2.0 jndiName=java:/Ms1Jms] txSync=null]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:604) ~[na:na]
        at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) ~[na:na]

      

+1


source







All Articles