Memory leak with TensorFlow for Java
The following memory leak test code:
private static final float[] X = new float[]{1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0};
public void testTensorFlowMemory() {
// create a graph and session
try (Graph g = new Graph(); Session s = new Session(g)) {
// create a placeholder x and a const for the dimension to do a cumulative sum along
Output x = g.opBuilder("Placeholder", "x").setAttr("dtype", DataType.FLOAT).build().output(0);
Output dims = g.opBuilder("Const", "dims").setAttr("dtype", DataType.INT32).setAttr("value", Tensor.create(0)).build().output(0);
Output y = g.opBuilder("Cumsum", "y").addInput(x).addInput(dims).build().output(0);
// loop a bunch to test memory usage
for (int i=0; i<10000000; i++){
// create a tensor from X
Tensor tx = Tensor.create(X);
// run the graph and fetch the resulting y tensor
Tensor ty = s.runner().feed("x", tx).fetch("y").run().get(0);
// close the tensors to release their resources
tx.close();
ty.close();
}
System.out.println("non-threaded test finished");
}
}
Is there something obvious I am doing wrong? The main thread is to create a graph and a session on this plot, create a placeholder and a constant to make the cumulative sum per tensor, fed as x. After running the resulting y operation, I close both x and y tensors to free up their memory resources.
Things I find so far to help:
- This is not a Java object memory issue. The heap doesn't grow, other memory in the JVM doesn't grow - according to jvisualvm. Apparently this is not a JVM memory leak according to Java Native Memory Tracking.
- The nearest operations help, if they are not there, the memory grows by leaps and bounds. With their help, it still grows pretty quickly, but almost as well as without them.
- The cumsum operator is not important, it also occurs with sum and other operators
- This happens on Mac OS with TF 1.1 and CentOS 7 with TF 1.1 and 1.2_rc0
- Commenting out the lines
Tensor ty
removes the leak, so it appears to be there.
Any ideas? Thank you! Also, here's a Github project that demonstrates this issue , both with a threaded test (to speed up memory growth) and an unfinished test (to show it is not due to threading). It uses maven and can work with simple:
mvn test
source to share
I believe there is indeed a leak (in particular, there is no TF_DeleteStatus
corresponding highlighting in the JNI code ) (Thanks for the detailed instructions for reproducing)
I would encourage you to submit an issue at http://github.com/tensorflow/tensorflow/issues and hopefully it should be fixed before the final 1.2 release.
(Also, you also have a leak outside of the loop, since the object Tensor
created is Tensor.create(0)
not closed)
UPDATE . This has been fixed and 1.2.0-rc1 will no longer have this problem.
source to share