Apache Giraph 1.0.0 - How is vertex memory allocated?

Recently I was able to create my own vertex class where each vertex has a LongWritable ID, and that ID is also its own value. My Giraph program runs successfully on a small set of vertices (100,000 vertices) and the program exits and outputs the expected values. However, when I increase the volume to 30 million vertices, the program hangs when the maximum memory is at its maximum (heap size is 1.5 GB per mapper). Since my vertex class only contains id and value (8 + 8 = 16 bytes) as well as outgoing edges (average 8 * 8 * 2 = 128 bytes), I don't understand why the memory consumption is so high. From the log message below, the memory is maxed out at 4.5 million vertices at 1363 MB, so each vertex is 317 bytes when Giraph is running.What additional data structures in Giraph make the bytes / vertex so tall?

readVertexInputSplit: Loaded 4500000 vertices at 90245.3768041096 vertices/sec 0 edges at 0.0 edges/sec Memory (free/total/max) = 187.52M / 1363.00M / 1365.50M

waitFor: Future result not ready yet java.util.concurrent.FutureTask@5f7bd943
      

Run codeHide result


+3


source to share





All Articles