Processing tokens with leading zeros

To tokenize the input expression, I use tokenize.generate_tokens()


tokens = cStringIO.StringIO(SourceLine).readline
tokens = tokenize.generate_tokens(tokens)


Now when SourceLine = "Y123 = 00911 + 98 / 3"

in a tuple tokens

, I get the following token values:

"Y123", "=" , "00", "911","+", "98" , "/" , "3"


However, when I go through SourceLine = "Y123 = 00411 + 98 / 3"

, I get:

"Y123", "=" , "00411", "+" ,"98","/","3"


I didn't understand why in the first case for 00911

it generated two tokens 00

and 911

instead of one token with a value 00911



source to share

2 answers

In Python 2, integer literals starting with 0

are interpreted as octal numbers (base 8). Accordingly, your first SourceLine

one is actually not syntactically valid because it is 9

not a valid octal digit:

>>> Y123 = 00911 + 98 / 3
  File "<stdin>", line 1
    Y123 = 00911 + 98 / 3
SyntaxError: invalid token


So it seems like the tokenizer parses it as a valid octal literal next to the decimal literal. You can send it back to the format you want if you are trying to parse some kind of Python language.



The reason is that tokenize interprets '00411' as an octal number, which '00911' is not. Thus, it returns "00", a real octal number, followed by "911", a real decimal number.



All Articles