Processing tokens with leading zeros
To tokenize the input expression, I use tokenize.generate_tokens()
:
tokens = cStringIO.StringIO(SourceLine).readline
tokens = tokenize.generate_tokens(tokens)
Now when SourceLine = "Y123 = 00911 + 98 / 3"
in a tuple tokens
, I get the following token values:
"Y123", "=" , "00", "911","+", "98" , "/" , "3"
However, when I go through SourceLine = "Y123 = 00411 + 98 / 3"
, I get:
"Y123", "=" , "00411", "+" ,"98","/","3"
I didn't understand why in the first case for 00911
it generated two tokens 00
and 911
instead of one token with a value 00911
?
source to share
In Python 2, integer literals starting with 0
are interpreted as octal numbers (base 8). Accordingly, your first SourceLine
one is actually not syntactically valid because it is 9
not a valid octal digit:
>>> Y123 = 00911 + 98 / 3
File "<stdin>", line 1
Y123 = 00911 + 98 / 3
^
SyntaxError: invalid token
So it seems like the tokenizer parses it as a valid octal literal next to the decimal literal. You can send it back to the format you want if you are trying to parse some kind of Python language.
source to share