I have been working on a project where at some point I require high optimization for the algorithm used in the calculations. I would like to know which way is better to go, if this can be done efficiently in Python, or on the other hand, I should try to implement a C++ routine that does exactly this; here is a sample code of the bottleneck I am experiencing:
import numpy as np
from numba import jit
from time import time
# this generates the outer product by rows of arrays 1 and 2
@jit( nopython=True )
def arrays_product( array_1, array_2 ):
dim_x = array_1.shape[0]
dim_y = array_2.shape[0]
total_dim = dim_x*dim_y
total_depth = array_1.shape[1]
result = np.empty( ( total_dim, total_depth ), dtype =np.complex128 )
counter = 0
for i in range( dim_x ):
for j in range( dim_y ):
result[ counter, : ] = np.multiply( array_1[ i, : ], array_2[ j, : ] )
counter += 1
return result
def create_complex_array( dim ):
return np.random.rand( dim ) + np.random.rand( dim )*1j
prefactor = np.pi
total_iterations = 40
tot_points = 3000
dim_0 = 4000
dim = 100 # the outer product dimension will be dimm**2
total_combs = 10000
#create some dictionary keys, that are composed of 12 integers each. To each key, a complex array is associated
combination_keys = np.random.randint( low=-128, high = 127, size=(total_combs, 12), dtype=np.int8 )
combs_dict = { tuple( x ): create_complex_array( tot_points ) for x in combination_keys }
comb_keys_array = np.array( list( combs_dict.keys() ) ) # put keys in array form
#source array used to make the calculations
the_source_array = np.random.rand( dim_0, tot_points ) + np.random.rand( dim_0, tot_points )*1j
#create an array of masks, from which different keys of the dictionary will be selected later
comb_masks_p = []
for k in range( total_iterations ):
random_choice = np.random.randint( low=0, high = total_combs - 1, size = dim**2 )
comb_masks_p.append( random_choice )
comb_masks = np.array( comb_masks_p )
#this will select two different sets of elements of the_source_array
array_masks = np.random.randint( low=0, high=dim_0 - 1, size=( total_iterations, 2, dim ) )
for k in range( total_iterations ):
ts = time()
selection_combs = comb_keys_array[ comb_masks[ k, : ] ]
keys_in_tuples = list( map( tuple, selection_combs ) ) # convert selected keys for the dictionary in list of tuples
# we iterate over the subset of selected keys and generate the corresponding complex array
# this will be slow when the total number of keys is high
array_1 = np.array( [ combs_dict[ key ] for key in keys_in_tuples ] )
# select left and right components to generate outer product by rows
mask_left = array_masks[ k, 0, : ]
mask_right = array_masks[ k, 1, : ]
array_left = the_source_array[ mask_left, : ]
array_right = the_source_array[ mask_right, : ]
# this part is also slow
product_array = arrays_product( array_left, array_right )
# finally, use array_1 to make a product element-wise and reduce on the first axis
result = ( prefactor*array_1*product_array ).sum( axis=0 )
print("Iteration time: ", time() - ts )
DESCRIPTION: We have a dictionary storing 1-D complex arrays, that need to be selected on a given iteration to build up a 2D array from them. The way to select the arrays is made by unique n-tuple keys accesing the dictionary elements. The created NumPy array from the dictionary always has the same dimensions ( dim**2 , tot_points ).
On top of that, we have a "source_array" (2D) from which values will be selected by rows. The way to select the appropiate values on a given iteration is by using fancy indexing or masks. This selection generates two different arrays ( array_left, array_right ), that are used later to generate an outer product in row-wise sense ( see function "arrays_product" with the Numba decorator ). Once this outer "product_array" is created, which has dimension ( dim**2, tot_points ), "then array_1" multiplies elementwise with "product_array", and a sum of the elements is carried over the 0 axis.
Note that in all cases, the critical point is the increasing of the "dim" parameter, as this makes the arrays from the outer product very big, as well as the total number of keys that need to be selected from the dictionary to build "array_1".
WHAT IS BEST? I believe the above code to be highly inefficient for large data-sets. I am looking for a way to optimize it either in pure Python, or by creating a C++ routine that will be called to perform the operations. Questions are:
Can I expect much difference with a C++ approach? I need to pass the dictionary as well as all the other objects and I don't how know much of a gain can I expect here. I am also open to use any other approaches like Cython, but I think there the dictionary object is the big problem, along with some other Python calls that might occur during a single iteration.
Using pure Python, and particularly the numpy module, is there a way to perform the contractions ( last two steps ) into a single call to numpy einsum or similars? I think this should be something straighforward because of the type of operations involved, but I am not capable of finding something.
There is no work-around about the external loop. The external loop is necessary to exist because in my main code, there is an additional step not reflected here ( which is not important for performance issues ).
I hope someone can help, thanks!