Java String UTF-8 decode 0xFF as 0xC3BF

I have a weird problem writing certain bytes to a file with an OutputStream.

The problem appears to be caused by "encoding" the data.

If I explain writing to the output stream

saveFile.write(new byte[]{(byte)0xFF});

      

It works correctly and I can see 0xFF in my hex editor.

But when I try to do it with strings, it doesn't work. Example:

scriptData = "some script data thats all text and stuff" + ((char)0xFF) + ((char)0x3B);
saveFile.write(scriptData.getBytes(Charset.forName("UTF-8")));

      

In my hex editor, I see text and then 0xC3BF and then 0x3B. Why does 0x3B write the file correctly, but 0xFF changes to 0xC3BF?

There was another stream I saw about this, but involved PrintStream, which I am not using AFAIK.

Problem writing 0xFF to file

Thank.

+3


source to share


1 answer


You are asking for the UTF-8 equivalent of the 0xFF character (pretty explicit). The 0xFF character, in UTF-8, is expressed as two bytes: 0xC3 and 0xBF. If you don't want to be UTF-8 encoded, don't use getBytes

with UTF-8 encoding.



Remember UTF-8 is not a single byte encoding. UTF-8 (like all Unicode conversions) is required to represent every Unicode character. This means that some characters in UTF-8 are one byte; others are two bytes; the third is three bytes and the third is four bytes.

+2


source







All Articles