Java string to byteArray conversion error
I am trying to encode / decode ByteArray
before String
but the I / O does not match. Am I doing something wrong?
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(by));
String s = new String(by, Charsets.UTF_8);
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(s.getBytes(Charsets.UTF_8)));
Output:
130021000061f8f0001a 130021000061efbfbd
Complete code:
String[] arr = {"13", "00", "21", "00", "00", "61", "F8", "F0", "00", "1A"};
byte[] by = new byte[arr.length];
for (int i = 0; i < arr.length; i++) {
by[i] = (byte)(Integer.parseInt(arr[i],16) & 0xff);
}
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(by));
String s = new String(by, Charsets.UTF_8);
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(s.getBytes(Charsets.UTF_8)));
source to share
The problem is that f8f0001a
it is not a valid UTF-8 byte sequence.
First of all, the start byte f8
denotes a sequence of 5 bytes, and you only have four. Secondly, for f8
it can follow only byte form 8x
, 9x
, ax
or bx
.
Therefore, it is replaced with unicode replacement character (U+FFFD)
, whose byte sequence in UTF-8 is efbfbd
.
And there (correctly) does not guarantee that converting an invalid byte sequence to and from a string will result in the same byte sequence. (Note that even with two seemingly identical strings, you can end up with different bytes representing them in Unicode, see Unicode equivalence .)
Moral of the story: If you want to represent bytes, don't convert them to characters, and if you want to represent text, don't use byte arrays.
source to share
My UTF-8 is a bit rusty :-) but the sequence F8 F0
is imho invalid utf-8 encoding.
Take a look at http://en.wikipedia.org/wiki/Utf-8#Description .
source to share
When you create String
from a byte array, the bytes are decoded.
Since the bytes from your code do not represent valid characters, the bytes that finally make up String
are not the same as your parameter.
Creates a new one
String
by decoding the specified byte array using the platform's default encoding. the length of the newString
is a function of the encoding and therefore need not be equal to the length of the byte array.The behavior of this constructor when the specified bytes are invalid in the default encoding is unspecified. The class
CharsetDecoder
should be used when more control over the decoding process is required.
source to share