I have some simple code that does the following.
It iterates over all possible length n lists F
with +-1 entries. For each one, it iterates over all possible length 2n
lists S
with +-1 entries, where the first half of $S$ is simply a copy of the second half. The code computes the inner product of F
with each sublist of S
of length n
. For each F, S it counts the inner products that are zero until the first non-zero inner product.
Here is the code.
#!/usr/bin/python
from __future__ import division
import itertools
import operator
import math
n=14
m=n+1
def innerproduct(A, B):
assert (len(A) == len(B))
s = 0
for k in xrange(0,n):
s+=A[k]*B[k]
return s
leadingzerocounts = [0]*m
for S in itertools.product([-1,1], repeat = n):
S1 = S + S
for F in itertools.product([-1,1], repeat = n):
i = 0
while (i<m):
ip = innerproduct(F, S1[i:i+n])
if (ip == 0):
leadingzerocounts[i] +=1
i+=1
else:
break
print leadingzerocounts
The correct output for n=14
is
[56229888, 23557248, 9903104, 4160640, 1758240, 755392, 344800, 172320, 101312, 75776, 65696, 61216, 59200, 59200, 59200]
Using pypy, this takes 1 min 18 seconds for n = 14. Unfortunately, I would really like to run it for 16,18,20,22,24,26. I don't mind using numba or cython but I would like to stay close to python if at all possible.
Any help speeding this up is very much appreciated.
I'll keep a record of the fastest solutions here. (Please let me know if I miss an updated answer.)
- n = 22 at 9m35.081s by Eisenstat (C)
- n = 18 at 1m16.344s by Eisenstat (pypy)
- n = 18 at 2m54.998s by Tupteq (pypy)
- n = 14 at 26s by Neil (numpy)
- n - 14 at 11m59.192s by kslote1 (pypy)
This new code gets another order of magnitude speedup by taking advantage of the cyclic symmetry of the problem. This Python version enumerates necklaces with Duval's algorithm; the C version uses brute force. Both incorporate the speedups described below. On my machine, the C version solves n = 20 in 100 seconds! A back-of-the-envelope calculation suggests that, if you were to let it run for a week on a single core, it could do n = 26, and, as noted below, it's amenable to parallelism.
C version:
You can get a bit of speedup by exploiting the sign symmetry (4x) and by iterating over only those vectors that pass the first inner product test (asymptotically, O(sqrt(n))x).
C version, to get a feel for how much performance we're losing to PyPy (16 for PyPy is roughly equivalent to 18 for C):
This algorithm is embarrassingly parallel because it's just accumulating over all values of
xor
. With the C version, a back-of-the-envelope calculation suggests that a few thousand hours of CPU time would suffice to calculaten = 26
, which works out to a couple hundred dollars at current rates on EC2. There are undoubtedly some optimizations to be made (e.g., vectorization), but for a one-off like this I'm not sure how much more programmer effort is worthwhile.