Erratic results of Python factoring function

Question

Erratic results of Python factoring function

def test_prime(n):
    q = True
    for p in range(2,n):  #Only need to check up to rootn for primes and n/2 for factors
        if int(n%p) is 0:         
            q = False
            print(p, 'and', int(n/p), 'are factors of ', n)    
    if q:
        print(n, 'IS a prime number!')
    else:
        print(n, 'IS NOT a prime number')

I just started playing with Python and I am putting together a few bits and pieces to convey the timing. I was playing around with testing for prime numbers and had an idea of how to show factors for prime numbers. The function I put together above seems to work well enough, except that it produces inconsistent results.

eg. If I set n = 65432 I get ...

2 and 32716 are factors of  65432
4 and 16358 are factors of  65432
8 and 8179 are factors of  65432
8179 and 8 are factors of  65432
16358 and 4 are factors of  65432
32716 and 2 are factors of  65432
65432 IS NOT a prime number

what i expect. But if I set n = 659306 I get ...

2 and 329653 are factors of  659306
71 and 9286 are factors of  659306
142 and 4643 are factors of  659306
4643 and 142 are factors of  659306
9286 and 71 are factors of  659306
659306 IS NOT a prime number

which is different from the fact that it does not include the 329653 factor at the very end. This is not a problem since all factors are displayed somewhere, but it annoys me that I don't know WHY this is happening for some numbers!

Just to show you that I'm not a complete jerk, I figured this only seems to happen with integer values over 5 characters. Can someone please tell me why the results are different in these two cases?

+3

python factorization

MalachiK Jan 29. 13 at 21:30

source to share

2 answers

DSM · Answer 1 · 2013-01-29T21:40:41+0000

You want n % p == 0

, not n % p is 0

. is

checks for identity, not equality, and not every 0 is the same as every other 0.

>>> 659306 % 329653
0
>>> (659306 % 329653) == 0
True
>>> (659306 % 329653) is 0
False
>>> id(0)
136748976
>>> id(659306 % 329653) 
3070888160

id

basically corresponds to a location in memory.

Think of it this way: if you have a loonie and I have a loonie, then they are equal in value to each other (1 == 1), but they are not the same objects (my dollar coin is not the same, what's your one dollar coin.) We could split the same coin, but that's not necessarily what we're doing.

[PS: You can use n//p

for integer division instead int(n/p)

.]

casevh · Answer 2 · 2013-01-30T03:44:07+0000

What goes on behind the scenes is a little tricky. My comments only apply to CPython

. Other implementations like PyPy, Jython, IronPython, etc. will behave differently.

To reduce memory usage and improve performance, CPython caches a series of small integers and tries to return a reference to those objects instead of creating another integer object with the same value. When you compare numbers to is

, you are actually checking if CPython returned a reference to the same cached object. But sometimes CPython doesn't check if the value is one of the cached integers. How could this happen?

I will explain CPython 3 as it is a little simpler than CPython 2. The type int

seen in CPython is actually called PyLong

the interpreter internally. PyLong

stores an integer as an array digits

, where each digit is between 0 and 2**15-1

(32-bit systems) or 0 and 2**30-1

(64-bit systems). The array grows as the number increases; this effectively allows unlimited integers. When evaluating, %

CPython checks if the second argument is digit

long. If so, it calls the C (divrem1) function, which returns the result digit

. Then called PyLong_FromLong

to convert the value that fits into the C long (i.e. the return value divrem

) to PyLong.PyLong_FromLong

checks if the argument is in the range of cached integers and returns a reference to the cached integer if possible.

If the second argument is longer than one digit

, another C function (x_divrem) is called. x_divrem

uses a general purpose arbitrary precision partitioning algorithm to calculate the remainder. Since it x_divrem

creates PyLong

to store the remainder during computation, there is no benefit to avoiding creating another repeating integer; it already exists. For calculations with random large numbers, the remainder will rarely be equal to one of the cached integers, so there is no time penalty to check.

There are other ways to create duplicate copies of entire entire caches. I just analyzed the question from the question.

And that's why you don't use is

to check for numeric equality ...

Erratic results of Python factoring function

More articles: