Converting int to char and then back to int - does not always give the same result

Question

Converting int to char and then back to int - does not always give the same result

I am trying to get char

from the value int

> 0xFFFF

. But instead, I always return the same value char

that, when clicked, int

outputs the value 65535

( 0xFFFF

).

I couldn't figure out why it generates characters for unicode> 0xFFFF

.

int hex = 0x10FFFF;
char c = (char)hex;
System.out.println((int)c);

I expected the output to be 0x10FFFF

. Instead, the output is returned as 65535

.

+3

java char int unicode

Adon smith June 27. 15 at 22:44

source to share

4 answers

int - 4 bytes. char is 2 bytes. Your number was within a range that an int can hold, but cannot be a char. So when you converted that number to char, it lost data and became the maximum size of char, which is what it printed, i.e. 65535

+2

Raman Shrivastava June 27. 15 at 22:49

source to share

Your number was too big to be char, which is 2 bytes. But it was small enough where it fit into an int which is 4 bytes. 65535 is the largest amount that is good for char, so you got that value. Also, if the char was big enough to fit your number when you cast it back to int, it could return the decimal value for 0x10FFFF, which is 1114111.

+2

Kyle 28 june 15 at 12:49 am

source to share

Unfortunately, I think you expected Java to char

be the same as the Unicode code point. They are not the same thing.

Java char

, as already expressed by other answers, can only support code points which can be represented in 16 bits, whereas Unicode requires 21 bits to support all code points.

In other words, Java char

itself only supports Basic Multilingual Plane characters (code points <= 0xFFFF

). In Java, if you want to represent a Unicode code point that is in one of the extended planes (code points> 0xFFFF

), you need surrogate characters or a character pair for that. This is how UTF-16 works . And, internally, this is how Java strings work. Just for fun, run the following snippet to see how a single Unicode code point is actually represented by two characters if the code point is> 0xFFFF

:

// Printing string length for a string with 
// a single unicode code point: 0x22BED.
System.out.println("𢯭".length()); // prints 2, because it uses a surrogate pair.

If you want to safely convert a value int

representing a Unicode code point to char

(or char

more precisely) and then convert it back to a code point int

, you would use code like this:

public static void main(String[] args) {
    int hex = 0x10FFFF;
    System.out.println(Character.isSupplementaryCodePoint(hex)); // prints true because hex > 0xFFFF
    char[] surrogateChars = Character.toChars(hex);
    int codePointConvertedBack = Character.codePointAt(surrogateChars, 0);
    System.out.println(codePointConvertedBack); // prints 1114111
}

Alternatively, instead of manipulating arrays, char

you can use String

, for example:

public static void main(String[] args) {
    int hex = 0x10FFFF;
    System.out.println(Character.isSupplementaryCodePoint(hex)); // prints true because hex > 0xFFFF
    String s = new String(new int[] {hex}, 0, 1);
    int codePointConvertedBack = s.codePointAt(0);
    System.out.println(codePointConvertedBack); // prints 1114111
}

For further reading: Java Character Class

+2

sstan 28 june At 1:33

source to share

Sam estep · Accepted Answer · 2015-06-27T22:45:34+0000

This is because while it int

is 4 bytes, it char

is only 2 bytes. Thus, you cannot represent all the values in char

that you can in int

. By using the standard unsigned integer representation, you can only represent a range of values from 0

to 2^16 - 1 == 65535

in a 2-byte value, so if you convert any number outside that range to a 2-byte value and back, you lose data.

Converting int to char and then back to int - does not always give the same result

More articles: