How do I get IDLE to accept Unicode character insertion?

Often when I work interactively in IDLE, I would like to insert a Unicode string into the IDLE window. It seems to insert correctly, but immediately generates an error. It has no problem displaying the same character in the output.

>>> c = u'Δ‰'
Unsupported characters in input

>>> print u'\u0109'
Δ‰

      

I suspect that the input box, like most Windows programs, uses UTF-16 internally and has no problem with the full set of Unicode; the problem is that IDLE insists on forcing all input to the default code page mbcs

, and anything not on that page is rejected.

Is there a way to configure or get IDLE to accept the full set of Unicode characters as input?

Python 3.2 handles this much better and has no problem with anything I throw at it.

I know I can just save the code to a file in UTF-8 and import it, but I want to be able to work with Unicode characters in an interactive window.

+3


source to share


1 answer


I finally figured out the way. Since the sources for IDLE are part of the distribution, you can make a couple of quick changes to enable this feature. Files can usually be found in C:\Python27\Lib\idlelib

.

The first step is to prevent IDLE from trying to encode all those pretty Unicode characters into a character set that cannot handle them. This is controlled IOBinding.py

. Edit the file, find the section after if sys.platform == 'win32':

and comment out this line:

#encoding = locale.getdefaultlocale()[1]

      

Now add the following line:

encoding = 'utf-8'

      

I was hoping there was a way to override this with an environment variable or something, but getdefaultlocale

calls directly the Win32 function which gets the globally encoded Windows encoding mbcs.



That's half the battle, now we need to get the command line interpreter to recognize that the input bytes are UTF-8 encoded. It didn't seem like there was a way to pass the encoding to the interpreter, so I came up with the mother of all hacks. Maybe someone with a little more patience can come up with a better way, but it works for now. The input is processed in PyShell.py

a function runsource

. Change the following:

    if isinstance(source, types.UnicodeType):
        from idlelib import IOBinding
        try:
            source = source.encode(IOBinding.encoding)
        except UnicodeError:
            self.tkconsole.resetoutput()
            self.write("Unsupported characters in input\n")
            return

      

To:

    from idlelib import IOBinding  # line moved
    if isinstance(source, types.UnicodeType):
        try:
            source = source.encode(IOBinding.encoding)
        except UnicodeError:
            self.tkconsole.resetoutput()
            self.write("Unsupported characters in input\n")
            return
    source = "#coding=%s\n%s" % (IOBinding.encoding, source)  # line added

      

We use PEP 263 to specify the encoding for each line of input provided to the interpreter.

Update : In Python 2.7.10 no more changes need to be made to PyShell.py

, it already works correctly if the encoding is set to utf-8

. Unfortunately, I haven't found a way to get around the change in IOBinding.py

.

+2


source







All Articles