Invalid Unicode code point 0xd83f

Question

Invalid Unicode code point 0xd83f

I am trying to port some Java to Go. Java code has a character variable with a value '\ud83f'

. When I try to use this value in Go, it doesn't compile:

package main
func main() {
    c := '\ud83f'
    println(c)
}

$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f

Why? I also tried making a string with this value in Python and it worked too. For some reason, it just doesn't work in Go.

+3

go unicode

Dog 28 Aug 14 at 20:18

source to share

2 answers

Already mentioned, \ud83f

is part of the surrogate half used in UTF-16 encoding. This is not considered a valid code point, and the Go Specification explicitly states:

The \ u and \ U expressions are Unicode code points, so inside them some values are illegal, in particular above 0x10FFFF and surrogate halves.

If you need a rune with this wrong code point, you can do the following:

c := rune(0xd83f)

But the correct way to handle such a value is to first decode the two surrogate halves and then use the resulting valid code point.

+3

ANisus 28 Aug 14 at 23:40

source to share

Harold R. Eason · Accepted Answer · 2014-08-29T00:59:01+0000

This rune literal that you tried to use is invalid as it denotes a surrogate code point. Spectrum says that rune literals cannot denote a surrogate code point ("like others" (what?)):

Runic literals

[...]

The \ u and \ U expressions are Unicode code points, so inside them some values are illegal, in particular above 0x10FFFF and surrogate halves.

Further in the examples, you can see another case, which is considered illegal:

'\ U00110000' // illegal: invalid Unicode code point

Which seems to imply that invalid codes (e.g. above 10ffff) are also illegal in rune literals.

Note that since it rune

is just an alias for int32

, you can simply do:

var r rune = 0xd8f3

instead

var r rune = '\ud8f3'

And if you want a number above 10FFFF, you can do

var r rune = 0x11ffff

instead

var r rune = '\U0011ffff'

Invalid Unicode code point 0xd83f

More articles: