Processing tokens with leading zeros

To tokenize the input expression, I use tokenize.generate_tokens()

:

tokens = cStringIO.StringIO(SourceLine).readline
tokens = tokenize.generate_tokens(tokens)

      

Now when SourceLine = "Y123 = 00911 + 98 / 3"

in a tuple tokens

, I get the following token values:

"Y123", "=" , "00", "911","+", "98" , "/" , "3"

      

However, when I go through SourceLine = "Y123 = 00411 + 98 / 3"

, I get:

"Y123", "=" , "00411", "+" ,"98","/","3"

      

I didn't understand why in the first case for 00911

it generated two tokens 00

and 911

instead of one token with a value 00911

?

+3


source to share


2 answers


In Python 2, integer literals starting with 0

are interpreted as octal numbers (base 8). Accordingly, your first SourceLine

one is actually not syntactically valid because it is 9

not a valid octal digit:

>>> Y123 = 00911 + 98 / 3
  File "<stdin>", line 1
    Y123 = 00911 + 98 / 3
               ^
SyntaxError: invalid token

      



So it seems like the tokenizer parses it as a valid octal literal next to the decimal literal. You can send it back to the format you want if you are trying to parse some kind of Python language.

+3


source


The reason is that tokenize interprets '00411' as an octal number, which '00911' is not. Thus, it returns "00", a real octal number, followed by "911", a real decimal number.



+2


source







All Articles