Java string to byteArray conversion error

I am trying to encode / decode ByteArray

before String

but the I / O does not match. Am I doing something wrong?

System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(by));
String s = new String(by, Charsets.UTF_8);
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(s.getBytes(Charsets.UTF_8)));

      

Output:

130021000061f8f0001a
130021000061efbfbd

      

Complete code:

String[] arr = {"13", "00", "21", "00", "00", "61", "F8", "F0", "00", "1A"};        
byte[] by = new byte[arr.length];

for (int i = 0; i < arr.length; i++) {
    by[i] = (byte)(Integer.parseInt(arr[i],16) & 0xff); 
}

System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(by));

String s = new String(by, Charsets.UTF_8);
System.out.println(org.apache.commons.codec.binary.Hex.encodeHexString(s.getBytes(Charsets.UTF_8)));

      

+3


source to share


3 answers


The problem is that f8f0001a

it is not a valid UTF-8 byte sequence.

First of all, the start byte f8

denotes a sequence of 5 bytes, and you only have four. Secondly, for f8

it can follow only byte form 8x

, 9x

, ax

or bx

.

Therefore, it is replaced with unicode replacement character (U+FFFD)

, whose byte sequence in UTF-8 is efbfbd

.



And there (correctly) does not guarantee that converting an invalid byte sequence to and from a string will result in the same byte sequence. (Note that even with two seemingly identical strings, you can end up with different bytes representing them in Unicode, see Unicode equivalence .)

Moral of the story: If you want to represent bytes, don't convert them to characters, and if you want to represent text, don't use byte arrays.

+4


source


My UTF-8 is a bit rusty :-) but the sequence F8 F0

is imho invalid utf-8 encoding.



Take a look at http://en.wikipedia.org/wiki/Utf-8#Description .

+3


source


When you create String

from a byte array, the bytes are decoded.

Since the bytes from your code do not represent valid characters, the bytes that finally make up String

are not the same as your parameter.

public String (byte [] bytes)

Creates a new one String

by decoding the specified byte array using the platform's default encoding. the length of the new String

is a function of the encoding and therefore need not be equal to the length of the byte array.

The behavior of this constructor when the specified bytes are invalid in the default encoding is unspecified. The class CharsetDecoder

should be used when more control over the decoding process is required.

+2


source







All Articles