Explicit kernel memory leak with G1 and huge memory
We are currently having problems with memory leaks in Java. The server is quite large (40cpus, 128GB storage). The Java heap size is 64G and we run a memory intensive application that reads a lot of data into lines with about 400 threads and discards it from memory after a few minutes.
So the heap fills up very quickly, but stuff on the heap is getting stale and can be GCed very quickly. Therefore, we have to use G1 to avoid having STW interrupts for a few minutes.
Now this works fine - the heap is big enough to run the application for days, nothing happens here. Anyway, the Java process grows and grows over time until all 128Gs are used up and the application crashes and crashes.
I have read a lot about Java native memory leaks including glibc issue with max. arenas (ours is husky since glibc 2.13, so can't fix here with MALLOC_ARENA_MAX = 1 or 4 without updating dist).
So we tried jemalloc, which gave us graphs for:
I don't understand what the problem is, does anyone have an idea?
If I set MALLOC_CONF = "narenas: 1" for jemalloc as the environment setting for the tomcat process running our application, can it somehow use the glibc malloc version?
This is our G1 setup, maybe the problem is here?
-XX:+UseCompressedOops
-XX:+UseNUMA
-XX:NewSize=6000m
-XX:MaxNewSize=6000m
-XX:NewRatio=3
-XX:SurvivorRatio=1
-XX:InitiatingHeapOccupancyPercent=55
-XX:MaxGCPauseMillis=1000
-XX:PermSize=64m
-XX:MaxPermSize=128m
-XX:+PrintCommandLineFlags
-XX:+PrintFlagsFinal
-XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:-UseAdaptiveSizePolicy
-XX:+UseG1GC
-XX:MaxDirectMemorySize=2g
-Xms65536m
-Xmx65536m
Thank you for your help!
source to share
We never called System.gc () explicitly, and meanwhile we stopped using G1 without specifying anything other than xms and xmx.
Therefore, using almost all 128G for the heap now. The memory usage of a Java process is high, but consistent over several weeks. I'm pretty sure this is a G1 problem, or at least a general GC problem. The only drawback of this "solution" is the high GC pauses, but they have decreased from the 90s to 1-5 as the heap grows, which is quite acceptable for the benchmark that we conduct with our servers.
Before that I played around with the -XX: ParallelGcThreads options which had a significant impact on the memory leak rate when going down from 28 (default for 40 cpus) down to 1. The memory graphs looked like a hand fan using different values ββin different instances ...
source to share