Trying UTF-8 to encode Files.write (..) but getting OutOfMemoryError

I am trying to encode my text file using UTF-8. When I do this, it works.

protected void writeFile(Path dir, StringBuilder sb) {
    try {
        String fileName = dir.toFile().getAbsolutePath() + File.separator + getClass().getSimpleName().toLowerCase() + ".impex";
        Path path = Paths.get(fileName);
        Files.write(path, sb.toString().getBytes(), StandardOpenOption.CREATE);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

      

But when I use UTF-8 or UTF8 encoding, I get java.lang.OutOfMemoryError: Java a bunch of space. Why is this and how can I solve this problem? (My memory settings are already 2 GB)

protected void writeFile(Path dir, StringBuilder sb) {
    try {
        String fileName = dir.toFile().getAbsolutePath() + File.separator + getClass().getSimpleName().toLowerCase() + ".impex";
        Path path = Paths.get(fileName);
        Files.write(path, sb.toString().getBytes("UTF8"), StandardOpenOption.CREATE);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

      

+3


source to share


3 answers


Looking at the getBytes implementation I find

    byte[] encode(char[] ca, int off, int len) {
        int en = scale(len, ce.maxBytesPerChar());
        byte[] ba = new byte[en];

      



that is, it int en = scale(len, ce.maxBytesPerChar());

requests about 4 times bytes of size String.

Debug your code and find exactly when it gets OutOfMemory

+3


source


UTF-8 will use multiple bytes for many Unicode characters. The previous code uses the default encoding, which is usually found on Windows with limited single-byte encoding.

You may try:

sb.trimToSize();

      

As a StringBuilder when added, always adds a little extra room, it might help in your case.

The next problem will probably be missing. It gets around toString()

, so you can try it first.



        Files.write(path, Collections.singletonList(sb), StandardCharsets.UTF_8);

      

One last try is to split sb:

        int length = sb.length();
        final int CHUNK_SIZE = 1000;
        int chunks = length / CHUNK_SIZE;
        int size = (length + CHUNK_SIZE - 1) / CHUNK_SIZE;
        List<CharSequence> chseqs = new ArrayList<>(size);
        int n = 1;
        for (int i = 0; i < length; i += n) {
            n = Math.min(CHUNK_SIZE, length - i);
            if (n == CHUNK_SIZE) {
                // Check that the last char is not the first of a surrogate pair.
                char ch = Character.charAt(chseqs, i + n - 1);
                if (Character.isHighSurrogate()) { // Leading of pair
                    --n;
                }
            }
            CharSequence chseq = sb.subSequence(i, i + n);
            chseqs.add(chseq);
        }
        Files.write(path, chseqs, StandardCharsets.UTF_8);

      

One final note, as most will probably think about it: try not to use StringBuilder for such large texts. Some Writer, or asynchronously connects things, Pipe.

0


source


Use the right tool for the job. If you want to write characters, don't use the method to write bytes.

To write content StringBuilder sb

to Path path

, use

Files.write(path, Collections.singleton(sb), StandardCharsets.UTF_8);

The main implementation should handle splitting the character into byte conversion.

If it doesn't, or if you can't live with the fact that the method adds a newline to the end of the file, you may need the following piece of code:

final int chunkSize=8000;
try(Writer w=Files.newBufferedWriter(path)) {
    for(int s=0, e; s<sb.length(); s=e) {
        e=Math.min(s+chunkSize, sb.length());
        w.append(sb.subSequence(s, e));
    }
}

      

Please note that Files.newBufferedWriter

the default is UTF-8

and this alternative does not insert new lines between the chunks.

0


source







All Articles