Java large number of char [], how to reduce?

I believe this garbage is generated when I call new String

in different places throughout my application. How can I "create" a string without creating a new object every time?

The reason that this sensitivity is garbage dependent is that my application cannot generate garbage as we need to run in real time with the default Java GC.

// you can see I use the same chars array
public String getB37String() {
    long l = getLong();
    int i = 0;
    while (l != 0L) {
        long l1 = l;
        l /= 37L;
        chars[11 - i++] = validChars[(int) (l1 - l * 37L)];
    }
    return new String(chars, 12 - i, i);
}

      

And for example using StringBuilder.toString()

which is using new String

from below.

// and you can see that I use the same builder
public String getString() {
    builder.delete(0, builder.length());
    char ascii;
    while (0 != (ascii = (char) getUByte()) && backing.hasRemaining())
        builder.append(ascii);
    return builder.toString();
}

      

+3


source to share


3 answers


First observation:

The reason that this sensitivity is garbage dependent is that my application cannot generate garbage as we need to run in real time with the default Java GC.

If this (" cannot garbage") is actually a true expression of 1 then you may have started in the wrong place by choosing Java as your implementation language.

Java is designed with the assumption that garbage generation is fine. This is a "cost" to avoid the inherent complexity (and subsequent pitfalls) of performing explicit memory management. This assumption permeates language design and standard library design.

Another thing about Java that is not "in your favor" is that it strongly supports good OO design principles. In particular, with a few exceptions, APIs provide strong abstraction and are designed to prevent pitfalls where applications can accidentally break things.

For example, when you do this:

  char[] c = new char[]{'a', 'b', 'c'};
  ...
  String s = new String(c);

      

the constructor String

allocates a new one char[]

and copies it to c

. What for? Because if it is not, you will have an "oozing abstraction". Someone can do this:



  char[] c = new char[]{'a', 'b', 'c'};
  ...
  String s = new String(c);
  ...
  c[0] = 'd';

      

and the leaky abstraction resulted in the change of the (supposedly) immutable object.


So what is a "solution"?

  • You can rewrite your application in C or C ++ or some other programming language where you can have complete control over memory allocation. (Of course, this is a lot of work ... and there may be other reasons why you cannot do this.)

  • You can redesign the relevant parts of your application so that they don't use String

    either StringBuilder

    or any of the standard Java classes that contain explicit or implicit (under the hood) heap allocation. It's not impossible, but it's a lot of work. For example, many standard and third party APIs expect you to pass objects String

    to them as parameters.

  • You can analyze the parts of your code that do string operations to make it smarter, to reduce garbage allocation.

Unfortunately, all of these things are likely to make your codebase larger, harder to read, harder to debug, and harder to maintain.


1 - One case where this might not be true - the problem you are really trying to solve is GC pauses. There are ways to access GC pauses that don't go as far as not generating garbage. For example, choosing a concurrent GC with a low pause and reducing the size of the space for the younger generation may result in pauses that are short enough not to be noticeable. Another trick is to force the GC at points where you know the user won't notice; for example when loading a new level into the game.

+6


source


Difference between both

Link here .

They are both the same, they are similar to any other object, but:

Since String is one of the most used types in any application, the Java designer has taken one more step to optimize the use of this class. This is why they came up with the idea of ​​caching all created String instances inside double quotes, for example. "Java"

... These double quotation marks are a letter known as a string literal, and the cache that stores these string instances is known as a string pool.

At a high level, both are objects String

, but the main difference comes from the fact that the operator new()

always creates a new line object. Also when you create String using literal they are interned.

String a = "Java";
String b = "Java";
System.out.println(a == b);  // True

      

Two different objects are created here and they have different references:

String c = new String("Java");
String d = new String("Java");
System.out.println(c == d);  // False

      

Likewise, when you compare a string literal to a String object created using the new () operator using the == operator, it will return false, as shown below:

String e = "JDK";
String f =  new String("JDK");
System.out.println(e == f);  // False

      

Garbage collectors

Link here .

In fact, String objects corresponding to string literals are generally not candidates for garbage collection. This is because there is an implicit reference to the string object in the code for every method that uses the literal. This means that the string is as long as the method can be executed.

However, this is not always the case. If the literal was defined in a class that was dynamically loaded (for example, using Class.forName (...)), then it can be arranged that the class is unloaded. If this happens, then the String object for the literal will be unavailable, and will be restored when the heap containing the interned string gets GC'ed.

String pool

Link here .



java.lang.String.intern()

returns an interned string, that is, one that has an entry in the global row pool. If the row is not already in the global row pool, it will be added.

You can programmatically follow this approach:

  • It follows from this that for any two strings s

    and t

    , is s.intern() == t.intern()

    true if and only if s.equals(t)

    true.

So if you use intern()

for a string:

  1. Call String.intern()

Then:

  1. It is guaranteed to be from a pool of unique strings.
+2


source


If you are using Java8u20 or newer you can try using -XX:+UseG1GC -XX:+UseStringDeduplication

to enable row deduplication .

While this will not avoid garbage generation, it can reduce memory pressure.


If you really want to instantiate String

without the cost of copying the array char[]

, you will need to access the private-private java.lang.String.String(char[], boolean)

or private constructor char[] value

via reflection with the appropriate runtime check / error if this actually works.

I wouldn't recommend it, but it's an option.


Another option is to stop using strings and work with ByteBuffer

. You can slice them as needed, return views, return read-only views, recycle them.

And they are also more compact if you are working with utf-8 data. The downside is that you cannot use APIs that require strings.

Or, just apply CharSequence / StringBuilder / Charbuffer objects as many places as possible.


Depending on your use cases, you can also create a row cache for your calculations. Map<T, String>

where T

is the input parameter of your calculation. This way you only need 1 line for each possible value T

.


return new String(chars, 12 - i, i);

      

Please note that Java 8 strings do not preserve internal offset, i.e. String objects are not a "representation" for some potentially larger char array.

It was different in the past, but because it was an implementation detail, it changed.

It would be possible to reverse this change with a custom String class added through the bootstrap classloader, but is more likely to break or cause severe performance degradation than not.


since we need to work in real time with the standard Java GC.

This could be your real problem.

None of the default collectors provide you with anything that comes close to real-time action. CMS or G1 can provide significantly faster pause times, especially on large heaps, than Serial or ParallelOld headers.

+2


source







All Articles