Flyway hangs regularly (MariaDB connector, RDS)
I've seen frequent deployment hangs during the migration phase. Java / Scala application packaged in WAR for Tomcat. The database is RDS Aurora using MariaDB connector ( https://downloads.mariadb.org/connector-java/ ).
Probably has nothing to do with Flyway, but is a common problem getting a connection.
The migration is done from a shell to a container:
java -cp `echo WEB-INF/lib/*|tr ' ' :` foo.Migrate
The migration code looks like this:
def main(args: Array[String]): Unit = {
Environment.dbFlywayPassword.foreach { pass =>
val flyway = new Flyway
flyway.setDataSource(Environment.jdbcUrl, "flyway", pass)
flyway.migrate
}
}
Connection string:
jdbc:mysql:aurora://%RDS_HOST%/xxx?serverSslCert=/rds-ca-2015-root.pem&useSSL=true&connectTimeout=10000
I've tried increasing the logging level in Flyway, but nothing is written after this line:
15:57:35.115 [main] INFO o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse
So, I have a stream dump that looks like this:
15:57:35.115 [main] INFO o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse
2017-06-08 15:57:56
Full thread dump OpenJDK 64-Bit Server VM (25.121-b13 mixed mode):
"MariaDb-failover-1" #8 daemon prio=5 os_prio=0 tid=0x00005555f80ae000 nid=0x14 waiting on condition [0x00007fc330b8f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f5c59b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00005555f70bf000 nid=0x12 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00005555f7063000 nid=0x11 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00005555f7060800 nid=0x10 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00005555f705e800 nid=0xf waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00005555f702f000 nid=0xe in Object.wait() [0x00007fc331616000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00005555f702c000 nid=0xd in Object.wait() [0x00007fc331717000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00005555f6f85000 nid=0xb runnable [0x00007fc34341d000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
- locked <0x00000000f0a66090> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x00000000f0a81eb0> (a sun.security.ssl.AppInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000000f06b5118> (a java.io.BufferedInputStream)
at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacketArray(StandardPacketInputStream.java:125)
at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacket(StandardPacketInputStream.java:95)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1002)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:982)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readRequestSessionVariables(AbstractConnectProtocol.java:498)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readPipelineAdditionalData(AbstractConnectProtocol.java:544)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:410)
at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:357)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:149)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:179)
at org.mariadb.jdbc.internal.failover.impl.MastersSlavesListener.initializeConnection(MastersSlavesListener.java:154)
at org.mariadb.jdbc.internal.failover.FailoverProxy.<init>(FailoverProxy.java:94)
at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:464)
at org.mariadb.jdbc.Driver.connect(Driver.java:103)
at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnectionFromDriver(DriverDataSource.java:416)
at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnection(DriverDataSource.java:381)
at org.flywaydb.core.internal.util.jdbc.JdbcUtils.openConnection(JdbcUtils.java:51)
at org.flywaydb.core.Flyway.execute(Flyway.java:1418)
at org.flywaydb.core.Flyway.migrate(Flyway.java:971)
at tgam.service.data.Migrate$.$anonfun$main$1(Migrate.scala:11)
at foo.Migrate$.$anonfun$main$1$adapted(Migrate.scala:8)
at foo.Migrate$$$Lambda$4/458209687.apply(Unknown Source)
at scala.Option.foreach(Option.scala:257)
at foo.Migrate$.main(Migrate.scala:8)
at foo.Migrate.main(Migrate.scala)
"VM Thread" os_prio=0 tid=0x00005555f7022000 nid=0xc runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00005555f70da800 nid=0x13 waiting on condition
JNI global references: 232
Heap
def new generation total 4928K, used 1611K [0x00000000f0600000, 0x00000000f0b50000, 0x00000000f5950000)
eden space 4416K, 24% used [0x00000000f0600000, 0x00000000f0712c10, 0x00000000f0a50000)
from space 512K, 100% used [0x00000000f0a50000, 0x00000000f0ad0000, 0x00000000f0ad0000)
to space 512K, 0% used [0x00000000f0ad0000, 0x00000000f0ad0000, 0x00000000f0b50000)
tenured generation total 10944K, used 4187K [0x00000000f5950000, 0x00000000f6400000, 0x0000000100000000)
the space 10944K, 38% used [0x00000000f5950000, 0x00000000f5d66e78, 0x00000000f5d67000, 0x00000000f6400000)
Metaspace used 13061K, capacity 13306K, committed 13568K, reserved 1060864K
class space used 1381K, capacity 1449K, committed 1536K, reserved 1048576K
It looks like I / O weighs in org.mariadb.jdbc.Driver.connect
, but I have a set connectTimeout
(10 seconds). This timeout doesn't seem to be effective (will I need it socketTimeout
for https://github.com/brettwooldridge/HikariCP/issues/754 ?)
This has been happening for a while. The same thing happened when I used Tomcat contextInitialized
to migrate. I decided to refactor a separate call before starting Tomcat, which looks like a better idea overall, but this did not affect this behavior.
What usually happens is that the code will hang, after 2-3 minutes ECS will delay the timeout and cause redeployment. After a certain number of these attempts (for example, up to 10), Flyway will start successfully and the service will start working.
source to share
OK, it looks like this is a known bug in RDS Aurora - and it is mentioned in the MariaDB Connector doc (of course it should be an execution warning though!)
https://mariadb.com/kb/en/mariadb/about-mariadb-connector-j/#infrequently-used
usePipelineAuth Not compatible with aurora. During connection, different requests are made. When the option is active, these requests are sent using a pipeline (all requests are sent, then only all results are read), which allows for faster connection creation. Default: true. Since 1.6.0
Also gladly wlad_
at Freenode #maria which pointed me in the right direction.
source to share