Processing tokens with leading zeros

Question

Processing tokens with leading zeros

To tokenize the input expression, I use tokenize.generate_tokens()

:

tokens = cStringIO.StringIO(SourceLine).readline
tokens = tokenize.generate_tokens(tokens)

Now when SourceLine = "Y123 = 00911 + 98 / 3"

in a tuple tokens

, I get the following token values:

"Y123", "=" , "00", "911","+", "98" , "/" , "3"

However, when I go through SourceLine = "Y123 = 00411 + 98 / 3"

, I get:

"Y123", "=" , "00411", "+" ,"98","/","3"

I didn't understand why in the first case for 00911

it generated two tokens 00

and 911

instead of one token with a value 00911

?

+3

python tokenize

Shrikant 18 Feb At 18:06

source to share

2 answers

DSM · Answer 1 · 2013-02-18T18:13:20+0000

In Python 2, integer literals starting with 0

are interpreted as octal numbers (base 8). Accordingly, your first SourceLine

one is actually not syntactically valid because it is 9

not a valid octal digit:

>>> Y123 = 00911 + 98 / 3
  File "<stdin>", line 1
    Y123 = 00911 + 98 / 3
               ^
SyntaxError: invalid token

So it seems like the tokenizer parses it as a valid octal literal next to the decimal literal. You can send it back to the format you want if you are trying to parse some kind of Python language.

isedev · Answer 2 · 2013-02-18T18:13:15+0000

The reason is that tokenize interprets '00411' as an octal number, which '00911' is not. Thus, it returns "00", a real octal number, followed by "911", a real decimal number.

Processing tokens with leading zeros

More articles: