Long term connections inside a docker container get interrupted

I have a docker image running inside kubernetes with a Python application that uses a persistent MySQL connection. The connection will die due to the underlying socket losing connection to the external host after seemingly random periods. Typical duration is 10 to 30 minutes. I've tested this docker container both locally and elsewhere in my production environment (outside of kubernetes) without running into connection errors.

Here is the docker version and the uname output for the image being executed:

$ docker --version
Docker version 1.12.6, build 78d1802

$ uname -a
Linux c1b1f31a4048 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

      

Below is the output of uname for the host:

Linux ip-10-2-110-119 4.4.41-k8s #1 SMP Mon Jan 9 15:34:39 UTC 2017 x86_64 GNU/Linux

      

I've seen a discussion about people with long-term connectivity issues dying due to starting and stopping other containers, eventually resulting in network loss across all containers on the host. I tried to reproduce this scenario outside of kubernetes by manually starting and stopping other containers, but was unable to reproduce the connection failure.

I had a theory that our NAT was connection binding due to long tcp_keepalive_timeout on the host (7200 seconds by default). I've drastically reduced this to ensure that TCP keepalive packets are sent when the connection is idle, but this has not been affected. I actually witnessed a connection loss in the middle of streaming many rows from MySQL.

Is there a specific network configuration that should be used to ensure long-term connections don't die in this environment?

+3


source to share





All Articles