How to optimize factorization code in Python?

768 Views Asked by At

I recently made a piece of code in Python that takes a user inputted number "n" and prints the prime factors of that number, including repeating primes. I got super excited cause I've been working on this for a while, but in reality it doesn't work so well. Any n larger than about 500,000 takes so long to factor and I want to know if there is any way to optimize it.

import sys

def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if not n & 1:
        return False
    for x in range(3,int(n**0.5)+1,2):
        if n % x == 0:
            return False
    return True

def factors(n):
    n = int(n)
    factors = []
    test_numbers = []
    if is_prime(n) == True:
        print "%s is prime!" % n
    else:   
        for a in range(1,n+1):
            if is_prime(a) == True:
                test_numbers.append(a)
        while int(len(test_numbers)) > 0:
            mx = int(max(test_numbers))
            while n % mx == 0:
                n = n/mx
                factors.append(mx)
            else:
                test_numbers.remove(mx)
        print factors


n = raw_input("What number would you like to factorize?  ")
factors(n)
print

thing = raw_input("Press ENTER to continue.")
sys.exit()
2

There are 2 best solutions below

0
On

Here are a couple of improvements using some Pythonic idioms such as list comprehension:

import math
def factors(n):
    def is_prime(n):
        return not [m for m in xrange(2,int(math.sqrt(n))) if not n % m]

    primes = []
    test_numbers = xrange(2,n+1)
    test_number = 2
    while not primes and test_number in test_numbers:
        if not n % test_number and is_prime(test_number):
            primes = primes + [test_number] + factors(n/test_number)
        test_number += 1            
    return primes

n = raw_input("What number would you like to factorize? ")
n = int(n)
prime_factors = factors(n)
print prime_factors

It will be fast or slow depending on how many factors n has (slowest if n is prime, fastest if it has lots of small prime factors).

0
On

Your algorithm is slow because it does too much work. There is no need to calculate the primes less then n, as your for loop does. And there is no need to separately test n for primality, because the algorithm will determine that itself. Here is a simple program to factor integers by trial division:

>>> def factors(n):
...     f, fs = 2, []
...     while f * f <= n:
...        while n % f == 0:
...            fs.append(f)
...            n /= f
...        f += 1
...     if n > 1: fs.append(n)
...     return fs
... 
>>> factors(997)
[997]
>>> factors(13290059)
[3119, 4261]
>>> factors(1234098760912343)
[67, 103, 184117, 971279]

Notice that 997 is determined to be prime. Notice too that the factors of a large number are calculated instantly, but don't count on that; trial division has a time complexity of O(sqrt(n)), and is slow when the factors of n are large.

There are better ways to factor integers. There are even better ways to factor integers by trial division. But that should be enough to get you started.