Flyway hangs regularly (MariaDB connector, RDS)

I've seen frequent deployment hangs during the migration phase. Java / Scala application packaged in WAR for Tomcat. The database is RDS Aurora using MariaDB connector ( https://downloads.mariadb.org/connector-java/ ).

Probably has nothing to do with Flyway, but is a common problem getting a connection.

The migration is done from a shell to a container:

java -cp `echo WEB-INF/lib/*|tr ' ' :` foo.Migrate

      

The migration code looks like this:

def main(args: Array[String]): Unit = {
  Environment.dbFlywayPassword.foreach { pass =>
    val flyway = new Flyway
    flyway.setDataSource(Environment.jdbcUrl, "flyway", pass)
    flyway.migrate
  }
}

      

Connection string:

jdbc:mysql:aurora://%RDS_HOST%/xxx?serverSslCert=/rds-ca-2015-root.pem&useSSL=true&connectTimeout=10000

      

I've tried increasing the logging level in Flyway, but nothing is written after this line:

15:57:35.115 [main] INFO  o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse

      

So, I have a stream dump that looks like this:

15:57:35.115 [main] INFO  o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse
2017-06-08 15:57:56
Full thread dump OpenJDK 64-Bit Server VM (25.121-b13 mixed mode):

"MariaDb-failover-1" #8 daemon prio=5 os_prio=0 tid=0x00005555f80ae000 nid=0x14 waiting on condition [0x00007fc330b8f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000f5c59b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00005555f70bf000 nid=0x12 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00005555f7063000 nid=0x11 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00005555f7060800 nid=0x10 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00005555f705e800 nid=0xf waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00005555f702f000 nid=0xe in Object.wait() [0x00007fc331616000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    - locked <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00005555f702c000 nid=0xd in Object.wait() [0x00007fc331717000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    - locked <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=0 tid=0x00005555f6f85000 nid=0xb runnable [0x00007fc34341d000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
    at sun.security.ssl.InputRecord.read(InputRecord.java:503)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
    - locked <0x00000000f0a66090> (a java.lang.Object)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
    - locked <0x00000000f0a81eb0> (a sun.security.ssl.AppInputStream)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    - locked <0x00000000f06b5118> (a java.io.BufferedInputStream)
    at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacketArray(StandardPacketInputStream.java:125)
    at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacket(StandardPacketInputStream.java:95)
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1002)
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:982)
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readRequestSessionVariables(AbstractConnectProtocol.java:498)
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readPipelineAdditionalData(AbstractConnectProtocol.java:544)
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:410)
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:357)
    at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:149)
    at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:179)
    at org.mariadb.jdbc.internal.failover.impl.MastersSlavesListener.initializeConnection(MastersSlavesListener.java:154)
    at org.mariadb.jdbc.internal.failover.FailoverProxy.<init>(FailoverProxy.java:94)
    at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:464)
    at org.mariadb.jdbc.Driver.connect(Driver.java:103)
    at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnectionFromDriver(DriverDataSource.java:416)
    at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnection(DriverDataSource.java:381)
    at org.flywaydb.core.internal.util.jdbc.JdbcUtils.openConnection(JdbcUtils.java:51)
    at org.flywaydb.core.Flyway.execute(Flyway.java:1418)
    at org.flywaydb.core.Flyway.migrate(Flyway.java:971)
    at tgam.service.data.Migrate$.$anonfun$main$1(Migrate.scala:11)
    at foo.Migrate$.$anonfun$main$1$adapted(Migrate.scala:8)
    at foo.Migrate$$$Lambda$4/458209687.apply(Unknown Source)
    at scala.Option.foreach(Option.scala:257)
    at foo.Migrate$.main(Migrate.scala:8)
    at foo.Migrate.main(Migrate.scala)

"VM Thread" os_prio=0 tid=0x00005555f7022000 nid=0xc runnable

"VM Periodic Task Thread" os_prio=0 tid=0x00005555f70da800 nid=0x13 waiting on condition

JNI global references: 232

Heap
 def new generation   total 4928K, used 1611K [0x00000000f0600000, 0x00000000f0b50000, 0x00000000f5950000)
  eden space 4416K,  24% used [0x00000000f0600000, 0x00000000f0712c10, 0x00000000f0a50000)
  from space 512K, 100% used [0x00000000f0a50000, 0x00000000f0ad0000, 0x00000000f0ad0000)
  to   space 512K,   0% used [0x00000000f0ad0000, 0x00000000f0ad0000, 0x00000000f0b50000)
 tenured generation   total 10944K, used 4187K [0x00000000f5950000, 0x00000000f6400000, 0x0000000100000000)
   the space 10944K,  38% used [0x00000000f5950000, 0x00000000f5d66e78, 0x00000000f5d67000, 0x00000000f6400000)
 Metaspace       used 13061K, capacity 13306K, committed 13568K, reserved 1060864K
  class space    used 1381K, capacity 1449K, committed 1536K, reserved 1048576K

      

It looks like I / O weighs in org.mariadb.jdbc.Driver.connect

, but I have a set connectTimeout

(10 seconds). This timeout doesn't seem to be effective (will I need it socketTimeout

for https://github.com/brettwooldridge/HikariCP/issues/754 ?)

This has been happening for a while. The same thing happened when I used Tomcat contextInitialized

to migrate. I decided to refactor a separate call before starting Tomcat, which looks like a better idea overall, but this did not affect this behavior.

What usually happens is that the code will hang, after 2-3 minutes ECS will delay the timeout and cause redeployment. After a certain number of these attempts (for example, up to 10), Flyway will start successfully and the service will start working.

+3


source to share


1 answer


OK, it looks like this is a known bug in RDS Aurora - and it is mentioned in the MariaDB Connector doc (of course it should be an execution warning though!)

https://mariadb.com/kb/en/mariadb/about-mariadb-connector-j/#infrequently-used



usePipelineAuth Not compatible with aurora. During connection, different requests are made. When the option is active, these requests are sent using a pipeline (all requests are sent, then only all results are read), which allows for faster connection creation. Default: true. Since 1.6.0

Also gladly wlad_

at Freenode #maria which pointed me in the right direction.

+2


source







All Articles