Explain why "-" is not "-"

I am writing an automation script in python that handled sending commands through a Telnet session. For some reason, I couldn't get it to work. After a ton of frustrating debugging, I found that when I was translating the command:

"ulimit -s 1024"

      

Team - within team has become something weird in utf-8. I had to translate it in bytes because I was sending it using Telnet (I know I have to use ssh, but honestly, this is good in my case) and I realized it was weird because when I was typing command in bytes, it would be: / p>

b"ulimit \x##\x##\x##s 1024"

      

I don't remember the exact numbers, but I fixed it by copying and pasting a new "-" which used two lines in the function and worked fine.

I copied and pasted the two lines part up, but I typed the ulimit -s part. I have also used IDLE

Does anyone know what happened?

+3


source to share


2 answers


Does anyone know what happened?

I see two possibilities here. First, you accidentally copied a line of code from a web page or other document where - has been replaced with emdash (this usually happens with quote marks and typographic quote marks), which looks like a minus sign, but it's a multibyte UTF8 sequence.



Another is that the IDLE editor performed a "spell check" like Microsoft Word, which replaces (among other things) typographic quotation marks, three consecutive periods with ellipsis and minus signs with emdash characters. This may have caused some rare key combination entered with an error (for example, I sometimes launch Windows 7 Magnifier when I try to enter, I think, the {character, which is on my keyboard Shift AltGr [).

0


source


You were able to enter something like U + 2013 EN DASH or U + 2014 EM DASH , which are both very similar to the ASCII character U + 002D HYPHEN MINUS .

Since any of these characters are outside the base Latin-1 alphabet, encoding either one of them in UTF-8 results in a 3-byte sequence:

>>> print('\u2013')
–
>>> print('\u2013'.encode('utf8'))
b'\xe2\x80\x93'
>>> print('\u2014')
—
>>> print('\u2014'.encode('utf8'))
b'\xe2\x80\x94'

      



These two are not the only confused symbols; several Yet:

and etc.

+6


source







All Articles