Trying UTF-8 to encode Files.write (..) but getting OutOfMemoryError

Question

Trying UTF-8 to encode Files.write (..) but getting OutOfMemoryError

I am trying to encode my text file using UTF-8. When I do this, it works.

protected void writeFile(Path dir, StringBuilder sb) {
    try {
        String fileName = dir.toFile().getAbsolutePath() + File.separator + getClass().getSimpleName().toLowerCase() + ".impex";
        Path path = Paths.get(fileName);
        Files.write(path, sb.toString().getBytes(), StandardOpenOption.CREATE);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

But when I use UTF-8 or UTF8 encoding, I get java.lang.OutOfMemoryError: Java a bunch of space. Why is this and how can I solve this problem? (My memory settings are already 2 GB)

protected void writeFile(Path dir, StringBuilder sb) {
    try {
        String fileName = dir.toFile().getAbsolutePath() + File.separator + getClass().getSimpleName().toLowerCase() + ".impex";
        Path path = Paths.get(fileName);
        Files.write(path, sb.toString().getBytes("UTF8"), StandardOpenOption.CREATE);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

+3

java out-of-memory utf-8

Gynnad 24 Sep 14 at 14:32

source to share

3 answers

UTF-8 will use multiple bytes for many Unicode characters. The previous code uses the default encoding, which is usually found on Windows with limited single-byte encoding.

You may try:

sb.trimToSize();

As a StringBuilder when added, always adds a little extra room, it might help in your case.

The next problem will probably be missing. It gets around toString()

, so you can try it first.

        Files.write(path, Collections.singletonList(sb), StandardCharsets.UTF_8);

One last try is to split sb:

        int length = sb.length();
        final int CHUNK_SIZE = 1000;
        int chunks = length / CHUNK_SIZE;
        int size = (length + CHUNK_SIZE - 1) / CHUNK_SIZE;
        List<CharSequence> chseqs = new ArrayList<>(size);
        int n = 1;
        for (int i = 0; i < length; i += n) {
            n = Math.min(CHUNK_SIZE, length - i);
            if (n == CHUNK_SIZE) {
                // Check that the last char is not the first of a surrogate pair.
                char ch = Character.charAt(chseqs, i + n - 1);
                if (Character.isHighSurrogate()) { // Leading of pair
                    --n;
                }
            }
            CharSequence chseq = sb.subSequence(i, i + n);
            chseqs.add(chseq);
        }
        Files.write(path, chseqs, StandardCharsets.UTF_8);

One final note, as most will probably think about it: try not to use StringBuilder for such large texts. Some Writer, or asynchronously connects things, Pipe.

0

Joop eggen 24 Sep 14 at 16:10

source to share

Use the right tool for the job. If you want to write characters, don't use the method to write bytes.

To write content StringBuilder sb

to Path path

, use

Files.write(path, Collections.singleton(sb), StandardCharsets.UTF_8);

The main implementation should handle splitting the character into byte conversion.

If it doesn't, or if you can't live with the fact that the method adds a newline to the end of the file, you may need the following piece of code:

final int chunkSize=8000;
try(Writer w=Files.newBufferedWriter(path)) {
    for(int s=0, e; s<sb.length(); s=e) {
        e=Math.min(s+chunkSize, sb.length());
        w.append(sb.subSequence(s, e));
    }
}

Please note that Files.newBufferedWriter

the default is UTF-8

and this alternative does not insert new lines between the chunks.

0

Holger Sep 25 At 10:26

source to share

Paul verest · Accepted Answer · 2014-09-25T11:02:21+0000

Looking at the getBytes implementation I find

    byte[] encode(char[] ca, int off, int len) {
        int en = scale(len, ce.maxBytesPerChar());
        byte[] ba = new byte[en];

that is, it int en = scale(len, ce.maxBytesPerChar());

requests about 4 times bytes of size String.

Debug your code and find exactly when it gets OutOfMemory

Trying UTF-8 to encode Files.write (..) but getting OutOfMemoryError

More articles: