Invalid Unicode code point 0xd83f
I am trying to port some Java to Go. Java code has a character variable with a value '\ud83f'
. When I try to use this value in Go, it doesn't compile:
package main
func main() {
c := '\ud83f'
println(c)
}
$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f
Why? I also tried making a string with this value in Python and it worked too. For some reason, it just doesn't work in Go.
source to share
This rune literal that you tried to use is invalid as it denotes a surrogate code point. Spectrum says that rune literals cannot denote a surrogate code point ("like others" (what?)):
[...]
The \ u and \ U expressions are Unicode code points, so inside them some values are illegal, in particular above 0x10FFFF and surrogate halves.
Further in the examples, you can see another case, which is considered illegal:
'\ U00110000' // illegal: invalid Unicode code point
Which seems to imply that invalid codes (e.g. above 10ffff) are also illegal in rune literals.
Note that since it rune
is just an alias for int32
, you can simply do:
var r rune = 0xd8f3
instead
var r rune = '\ud8f3'
And if you want a number above 10FFFF, you can do
var r rune = 0x11ffff
instead
var r rune = '\U0011ffff'
source to share
Already mentioned, \ud83f
is part of the surrogate half used in UTF-16 encoding. This is not considered a valid code point, and the Go Specification explicitly states:
The \ u and \ U expressions are Unicode code points, so inside them some values are illegal, in particular above 0x10FFFF and surrogate halves.
If you need a rune with this wrong code point, you can do the following:
c := rune(0xd83f)
But the correct way to handle such a value is to first decode the two surrogate halves and then use the resulting valid code point.
source to share