Improve speed of summing up loop in python

132 Views Asked by At

What can I do to speed up the execution of the following code significantly (in particular for even higher values of N). I think the main problem is the sum in the get_field function.

from numpy import *
from numpy.linalg import *
from scipy import constants as co
import pylab as plt

eps0 = co.epsilon_0


charges = []


class charge:
    def __init__(self, q, pos):
        self.q=q
        self.pos=pos

a = 10
N = 1001

d = 1e-3
Q = 1e-8
q = Q/N**2

for x in linspace(0,a,N):
    for y in linspace(0,a,N):
        ## First plate
        position = array([x,y,0])
        NewCharge = charge(q,position)
        charges.append(NewCharge)

        ## Second Plate
        position2 = array([x,y,d])
        NewCharge2 = charge(-q,position2)
        charges.append(NewCharge2)

def get_field(position):
    E = array([0.,0.,0.])
    
    for C in charges:
        q = C.q
        rVector = position - C.pos
        r = norm(rVector)
        r0Vector = rVector/r
        
        E += 1/(4*pi*eps0)*q/r**2 * r0Vector
        #print(E)
    return E[2]




p = 0.1
x0 = 0.5*a
y0 = 0.5*a


#print(get_field([x0,y0,1]))

Z = linspace(p*d,(1-p)*d,100)

E = [get_field([x0,y0,z]) for z in Z]

print(Z)
print(E)
plt.plot(Z,E,'x')
plt.ylim(bottom=0)
plt.show()

The background is a physics problem where I want to find the electric field of a parallel plate capacitor by explicitly summing up the coulomb fields of evenly distributed point charges on two parallel plates.

Update @Blckknght's answer improves the performance of my code significantly, for example for N = 100, my original code takes approx 90 seconds on my system and your code 0,6 seconds. For N = 500 my code takes approx 2300 seconds, Blckknght's code approx 3 seconds.

The problem with this improved code is that the memory consumption is so high that in my case the program just gets killed if my 16 GB memory are exceeded.

So is there any way to get those optimizations without running into the memory problem at all?

2

There are 2 best solutions below

6
Blckknght On

Programming fast numerical code with numpy is quite different than writing normal Python code. You often need to discard some best practices from normal code to get good numerical performance.

In this context, I think the thing to discard is using object oriented programming, with your Charge class. Using a single numpy array to contain all the coordinates of all the charges is going to be much more efficient than having a separate array for each point, as the library can use fancy parallel CPU instructions to do all the computation in one go, rather than charge by charge in a slow pure-Python loop.

Here's my crude solution, which builds up a few different arrays, one with the coordinates of the charges on the two plates, one for the charge values (q and -q), and then one for the points that vary over the Z axis where you want to sample the field. I use various numpy broadcasting techniques to combine them.

import numpy as np
from scipy.constants import pi, epsilon_0 as eps0

a = 10
N = 1001

d = 1e-3
Q = 1e-8
q = Q/N**2

charge_coords = np.mgrid[0:a:N*1j, 0:a:N*1j, 0:d:2j].reshape(3,-1).T
charge_values = np.array([q, -q] * N**2).reshape(1,2*N**2,1)

p = 0.1
x0 = 0.5*a
y0 = 0.5*a

test_coords = np.linspace([x0, y0, p*d], [x0, y0, (1-p)*d], 100)

rVectors = test_coords[:, None,:] - charge_coords[None,:,:]
rMagnitudes = np.linalg.norm(rVectors, axis=2, keepdims=True)

E = np.sum(1/(4*pi*eps0) * charge_values / rMagnitudes**3 * rVectors, axis=1)

The funny indexing of np.mgrid with imaginary step values is based on this answer which I found by googling. You can also use np.meshgrid, it just takes some extra steps. The rest is pretty standard numpy logic, you just need to reshape or slice to add dimensions a few times to get the arrays to broadcast together properly in the final step.

The biggest downside of using big numpy arrays like this is memory usage. Since we're taking the Cartesian product of the 100 test coordinates with the 2 million-ish charge coordinates, we wind up with several gigabytes of memory being used all at once (about 5 GB for the largest array on my system). That could get slow if you run out of real RAM and your computer starts needing to swap things out of virtual memory. It might be possible to save a little memory here or there by deleting references to arrays we don't need any more (or not keeping persistent references to temporary intermediate values at all). I avoided making a separate variable for the unit vectors by just combining the division by the magnitude with the existing magnitude squared that the charge was being divided by.

0
bubblefoil On

If you can sacrifice some precision, you can convert the arrays to float32. This reduces the memory footprint and also it runs 3x faster on my HW. Judging by the notebook's kernel memory monitor, there's still a peak of ~4GB.

Updated code from @Blckknght's answer:

import numpy as np
from scipy.constants import pi, epsilon_0 as eps0

a = 10
N = 1001

d = 1e-3
Q = 1e-8
q = Q/N**2

charge_coords = np.mgrid[0:a:N*1j, 0:a:N*1j, 0:d:2j].reshape(3,-1).T.astype(dtype=np.float32)
charge_values = np.array([q, -q] * N**2, dtype=np.float32).reshape(1,2*N**2,1)

p = 0.1
x0 = 0.5*a
y0 = 0.5*a

test_coords = np.linspace([x0, y0, p*d], [x0, y0, (1-p)*d], 100, dtype=np.float32)

rVectors = test_coords[:, None,:] - charge_coords[None,:,:]
rMagnitudes = np.linalg.norm(rVectors, axis=2, keepdims=True)

E = np.sum(1/(4*pi*eps0) * charge_values / rMagnitudes**3 * rVectors, axis=1)
print([a.nbytes for a in (charge_coords, charge_values, test_coords, rVectors, rMagnitudes, E)])