Java String UTF-8 decode 0xFF as 0xC3BF
I have a weird problem writing certain bytes to a file with an OutputStream.
The problem appears to be caused by "encoding" the data.
If I explain writing to the output stream
saveFile.write(new byte[]{(byte)0xFF});
It works correctly and I can see 0xFF in my hex editor.
But when I try to do it with strings, it doesn't work. Example:
scriptData = "some script data thats all text and stuff" + ((char)0xFF) + ((char)0x3B);
saveFile.write(scriptData.getBytes(Charset.forName("UTF-8")));
In my hex editor, I see text and then 0xC3BF and then 0x3B. Why does 0x3B write the file correctly, but 0xFF changes to 0xC3BF?
There was another stream I saw about this, but involved PrintStream, which I am not using AFAIK.
Thank.
source to share
You are asking for the UTF-8 equivalent of the 0xFF character (pretty explicit). The 0xFF character, in UTF-8, is expressed as two bytes: 0xC3 and 0xBF. If you don't want to be UTF-8 encoded, don't use getBytes
with UTF-8 encoding.
Remember UTF-8 is not a single byte encoding. UTF-8 (like all Unicode conversions) is required to represent every Unicode character. This means that some characters in UTF-8 are one byte; others are two bytes; the third is three bytes and the third is four bytes.
source to share