Integer identity testing: inconsistent behavior between large positive and small negative integers

I am using Anaconda (Python 3.6).

Interactively, I performed an object identity check for positive integers> 256:

# Interactive test 1
>>> x = 1000
>>> y = 1000
>>> x is y
False

      

Obviously, large integers (> 256) written to separate lines are not reused interactively.

But if we write the assignment on one line, the large positive integer object is reused:

# Interactive test 2
>>> x, y = 1000, 1000
>>> x is y
True

      

That is, in interactive mode, writing integer assignments on one or separate lines will make the difference to reuse integer objects (> 256). For integers in [-5,256] (as described by https://docs.python.org/2/c-api/int.html ), the caching mechanism ensures that only one object is created, regardless of whether the assignment is in one or more lines.

Now considering small negative integers less than -5 (any negative integer outside the range [-5, 256] will serve the purpose), unexpected results appear:

# Interactive test 3
>>> x, y = -6, -6
>>> x is y
False     # inconsistent with the large positive integer 1000

>>> -6 is -6
False

>>> id(-6), id(-6), id(-6)
(2280334806256, 2280334806128, 2280334806448)

>>> a = b =-6
>>> a is b
True    # different result from a, b = -6, -6

      

Clearly this demonstrates inconsistency for the object identity test between large positive integers (> 256) and small negative integers (<-5). And for small negative integers (<-5), writing a, b = -6, -6, and a = b = -6 also makes a difference (as opposed to not using the large integer form). Any explanations for these strange behaviors?

For comparison, go to start the IDE (I am using PyCharm with the same Python 3.6 interpreter), I run the following script

# IDE test case
x = 1000
y = 1000
print(x is y) 

      

It prints True other than interactive launch. Thanks to @Ahsanul Haque who already gave a good explanation of the inconsistency between IDE startup and interactive startup. But it still remains to answer my question about the inconsistency between the large positive integer and the small negative integer in an interactive run.

+3


source to share


3 answers


When you run 1000 is 1000

in an interactive shell or as part of a larger script, CPython generates bytecode like

In [3]: dis.dis('1000 is 1000')
   ...: 
  1           0 LOAD_CONST               0 (1000)
              2 LOAD_CONST               0 (1000)
              4 COMPARE_OP               8 (is)
              6 RETURN_VALUE

      

What it does:

  • Loads two constants (LOAD_CONST pops co_consts [consti] onto the stack - docs )
  • Compares them with is

    ( True

    if the operands refer to the same object; False

    otherwise)
  • Returns the result

As CPython only creates one Python object for a constant used in a code block , 1000 is 1000

will result in a single integer constant:

In [4]: code = compile('1000 is 1000', '<string>', 'single') # code object

In [5]: code.co_consts # constants used by the code object
Out[5]: (1000, None)

      

As per the above bytecode Python will load the same object twice and compare it with itself, so the expression will evaluate as True

:

In [6]: eval(code)
Out[6]: True

      

Results differ for -6

because it is not immediately recognized as a constant : -6

In [7]: ast.dump(ast.parse('-6'))
Out[7]: 'Module(body=[Expr(value=UnaryOp(op=USub(), operand=Num(n=6)))])'

      

-6

- an expression that negates the value of an integer literal 6

.

However, the bytecode for is -6 is -6

practically the same as the first example of bytecode:

In [8]: dis.dis('-6 is -6')
  1           0 LOAD_CONST               1 (-6)
              2 LOAD_CONST               2 (-6)
              4 COMPARE_OP               8 (is)
              6 RETURN_VALUE

      

So Python loads two constants -6

and compares them using is

.

How does an expression -6

become a constant? CPython has a peek optimizer capable of optimizing simple expressions containing constants by evaluating them immediately after compilation and storing the results in a constant table.



Starting with CPython 3.6, folding unary operations are handled fold_unaryops_on_constants

in Python/peephole.c

. In particular, -

(unary minus) is evaluated PyNumber_Negative

, which returns a new Python object ( -6

not cached
). After that, the newly created object is inserted into the table consts

. However, the optimizer does not check if the result of an expression can be reused, so the results of identical expressions become different Python objects (again, since CPython 3.6).

To illustrate this, I'll compile the expression -6 is -6

:

In [9]: code = compile('-6 is -6', '<string>', 'single')

      

There co_consts

are two constants in the root directory-6

In [10]: code.co_consts
Out[10]: (6, None, -6, -6)

      

and have different memory addresses

In [11]: [id(const) for const in code.co_consts if const == -6]
Out[11]: [140415435258128, 140415435258576]

      

Of course, this means that -6 is -6

it matters False

:

In [12]: eval(code)
Out[12]: False

      


For the most part, the above explanation remains valid in the presence of variables. When executed in an interactive shell, these three lines

>>> x = 1000
>>> y = 1000
>>> x is y
False

      

are parts of three different code blocks, so the constant 1000

will not be reused. However, if you put them all in one block of code (for example, the body of a function), the constant will be reused.

In contrast, a string is x, y = 1000, 1000

always executed in one block of code (even in an interactive shell) and therefore CPython always reuses a constant. B x, y = -6, -6

, is -6

not reused for the reasons explained in the first part of my answer.

x = y = -6

trivially. Since there is only one Python object, it x is y

returns True

even if you replaced -6

with something else.

+2


source


Only one copy of a specific constant is created for a specific source code and reused as needed. So in pycharm you get x is y

== True

.

But it's different in the interpreter. Only one line / statement runs here. A specific constant is created for each new line. It is not reused on the next line. So x is not y

here.

But, if you can initialize on the same line, you can have the same behavior (Reusing the same constant).

>>> x,y = 1000, 1000
>>> x is y
True
>>> x = 1000
>>> y = 1000
>>> x is y
False
>>> 

      

Edit:

A block is a piece of text in a Python program that runs as a unit.

In the IDE, the entire module is launched at once, i.e. the whole module is a block. But in interactive mode, each command is a block of code that is executed immediately.



As I said earlier, a specific constant is created once for a block of code and reused if it appears again in that block of code.

This is the main difference between IDE and interpreter.

Then why does the interpreter give the same output as the IDE for smaller numbers? This is when integer caching is taken into account.

If the numbers are less than they are cached and reused in the next block of code. This way we get the same ID in the IDE.

But if they are larger, they are not cached. Rather, a new copy is created. So, as expected, the ID is different.

Hope it makes sense now,

+5


source


In addition to Ahsanul Haque's answer, try this in any IDE:

x = 1000
y = 1000
print (x is y)
print('\ninitial id x: ',id(x))
print('initial id y: ',id(y))

x=2000
print('\nid x after change value:   ',id(x))
print('id y after change x value: ', id(y))

initial id x:  139865953872336
initial id y:  139865953872336

id x after change value:    139865953872304
id y after change x value:  139865953872336

      

You will most likely see the same ID for "x" and "y", then run the code in an interpreter and the IDs will be different.

>x=1000
>y=1000

>id(x)
=> 139865953870576
>id(y)
=> 139865953872368

      

See here .

0


source







All Articles