Why does String [] take up so much more space than char []?

Purpose:

I was writing an application Java

to read large text files where data is represented in character column format. For example:

A B R S Y E ...
R E W I W I ...
E Q B U O Y ...
W Q V G O R ...

      

i.e. one alphabet separated by a space. Each such line has millions of such characters. And each file has several such lines.

Setting:

My task is to manipulate the file column by column. So I am reading the file line by line, split into ' '

and array created. From such arrays I created a 2D array. Everything was fine when I tested it in a small file with 10 lines. But it started to fail when I read files with 500 lines. My car also has a JVM

lot of memory, so I didn't expect that. So I did some profiling and saw that reading the lines in String[]

was a lot more memory than expected. So I changed String[]

to char[]

. Memory usage dropped dramatically and everything was fine.

Question:

My question is, why does it String[]

take up much more space than char[]

? Is it because it looks like an array of objects? (since String is also an object). If someone can explain the low level details that would be really great.

EDIT 1:

Here's what I did before:

String[] parts = line.split(" ");                // Creating a String[]

      

Here's what I changed:

String rowNoSpaces = line.replaceAll(" ", "");   // Removing all the spaces
char[] columns= rowNoSpaces.toCharArray();       // Creating a char[], instead of String[]

      

Let me know if more information is needed.

+3


source to share


1 answer


Since it char

is a primitive type, a character array will store these bytes directly in the array without any overhead.

In contrast, String

is an object, so the array will store pointers to instances String

elsewhere on the heap, each with its own vtable overhead, length, and other information (including a separate reference to a char[]

with the actual text). Having a large number of objects also increases the risk of GC heap fragmentation.



Also, if you construct strings by concatenation instead of StringBuilder

s, you also end up with a lot of extra copies taking up much more memory.

+10


source







All Articles