Trying UTF-8 to encode Files.write (..) but getting OutOfMemoryError
I am trying to encode my text file using UTF-8. When I do this, it works.
protected void writeFile(Path dir, StringBuilder sb) {
try {
String fileName = dir.toFile().getAbsolutePath() + File.separator + getClass().getSimpleName().toLowerCase() + ".impex";
Path path = Paths.get(fileName);
Files.write(path, sb.toString().getBytes(), StandardOpenOption.CREATE);
} catch (Exception e) {
e.printStackTrace();
}
}
But when I use UTF-8 or UTF8 encoding, I get java.lang.OutOfMemoryError: Java a bunch of space. Why is this and how can I solve this problem? (My memory settings are already 2 GB)
protected void writeFile(Path dir, StringBuilder sb) {
try {
String fileName = dir.toFile().getAbsolutePath() + File.separator + getClass().getSimpleName().toLowerCase() + ".impex";
Path path = Paths.get(fileName);
Files.write(path, sb.toString().getBytes("UTF8"), StandardOpenOption.CREATE);
} catch (Exception e) {
e.printStackTrace();
}
}
source to share
Looking at the getBytes implementation I find
byte[] encode(char[] ca, int off, int len) {
int en = scale(len, ce.maxBytesPerChar());
byte[] ba = new byte[en];
that is, it int en = scale(len, ce.maxBytesPerChar());
requests about 4 times bytes of size String.
Debug your code and find exactly when it gets OutOfMemory
source to share
UTF-8 will use multiple bytes for many Unicode characters. The previous code uses the default encoding, which is usually found on Windows with limited single-byte encoding.
You may try:
sb.trimToSize();
As a StringBuilder when added, always adds a little extra room, it might help in your case.
The next problem will probably be missing. It gets around toString()
, so you can try it first.
Files.write(path, Collections.singletonList(sb), StandardCharsets.UTF_8);
One last try is to split sb:
int length = sb.length();
final int CHUNK_SIZE = 1000;
int chunks = length / CHUNK_SIZE;
int size = (length + CHUNK_SIZE - 1) / CHUNK_SIZE;
List<CharSequence> chseqs = new ArrayList<>(size);
int n = 1;
for (int i = 0; i < length; i += n) {
n = Math.min(CHUNK_SIZE, length - i);
if (n == CHUNK_SIZE) {
// Check that the last char is not the first of a surrogate pair.
char ch = Character.charAt(chseqs, i + n - 1);
if (Character.isHighSurrogate()) { // Leading of pair
--n;
}
}
CharSequence chseq = sb.subSequence(i, i + n);
chseqs.add(chseq);
}
Files.write(path, chseqs, StandardCharsets.UTF_8);
One final note, as most will probably think about it: try not to use StringBuilder for such large texts. Some Writer, or asynchronously connects things, Pipe.
source to share
Use the right tool for the job. If you want to write characters, don't use the method to write bytes.
To write content StringBuilder sb
to Path path
, use
Files.write(path, Collections.singleton(sb), StandardCharsets.UTF_8);
The main implementation should handle splitting the character into byte conversion.
If it doesn't, or if you can't live with the fact that the method adds a newline to the end of the file, you may need the following piece of code:
final int chunkSize=8000;
try(Writer w=Files.newBufferedWriter(path)) {
for(int s=0, e; s<sb.length(); s=e) {
e=Math.min(s+chunkSize, sb.length());
w.append(sb.subSequence(s, e));
}
}
Please note that Files.newBufferedWriter
the default is UTF-8
and this alternative does not insert new lines between the chunks.
source to share