Multiprocessing in python for numerical simulation with large objects

Question

Multiprocessing in python for numerical simulation with large objects

137 Views Asked by Marko Šterk At 15 February 2023 at 15:33

I am trying to put together a numerical simulation (specifically, Beta cell dynamics) based on Betram et al. 2007 (https://www.sciencedirect.com/science/article/pii/S0006349507709621). The model itself works fine but it is very slow since the simulation step must be around 0.1 ms and python is not the fastest language around. It takes approximately 12 real seconds for every simulation second with only 15 coupled beta cells in the system. In the end, I will need around 1000 beta cells to simulate an entire islet of Langerhans so you can see why I need to speed things up.

Each beta cell is implemented as a class instance which inherits from the CellParameters and ModelParameters class.

@jitclass(spec)
class BetaCell:
     def __init__(self, cell_num: int, neighbours: list, G: float):
          ##sets initial conditions (23 parameters - floats and lists)).

     def w_ijkl(self, ii, jj, kk, ll, f6p):
          ###calculates and returns a specific parameter

     def run_model_step(self, Ge: float):
          ###runs one time step (dt=0.1 ms) for the cell.
          ###has to calculate/update around 55 parameters

class ModelParameters:
    ###Contains all model parameters
    ###time step, the intensity of glucose stimulation, the start of stimulation etc.
    ###also contains when to save a time step for later visualization

    @staticmethod
    def external_glucose(time):
       ###calculates and returns the current level of external glucose
       ###uses a simple equation

class CellParameters:
    ###Contains approx. 70 parameters (floats) that the the model needs for execution.
    ###Some of these parameters are changed (once) after initialization
    ###to introduce some cell heterogeneity

The simulation looks like this:

some data is imported with cell parameters (locations, coupling, coupling weights).
each cell is initialized with its cell number (0, 1, 2, 3...), neighbours and starting glucose Cells are stored into a list named "cells".
if required, heterogeneity is introduced into cellular parameters
each step of the simulation is executed

Simulation step execution:

def run_step(cell):
    cell.run_model_step(glc)

if __name__ == '__main__':
    for step, current_time in enumerate(time):
        ###time array is pre-calculated based on provided end_time and simulation step (dt)
        glc = ModelParameters.external_glucose(current_time)
        cells = calculate_gj_coupling(cells) #calculates gap-jounction coupling between connected cells
        cells = list(map(run_step, cells))

The above for-loop is repeated until the end of the simulation is reached. Ofcourse this is a slow process taking around 10-12 seconds for each simulation second (10000 loop iterations @ 0.1 ms steps)

I really need to speed things up, preferably around 10-fold or more would be great.

Sofar I tried to use the Pool class from the multiprocessing module.

I created a pool: pool = Pool(processes=NUMBER_OF_WORKERS) I used the pools map function to run a simulation step for each cell

pool = Pool(processes=NUMBER_OF_WORKERS)
.
.
.
for step, current_time in enumerate(time):
    ###time array is pre-calculated based on provided end_time and simulation step (dt)
    glc = ModelParameters.external_glucose(current_time)
    cells = calculate_gj_coupling(cells) #calculates gap-jounction coupling between connected cells
    cells = pool.map(run_step, cells)

pool.terminate()

The rest is the same as before, because the slow part is the calculation of individual time steps for every beta cell.

The problem with the above solution is that it makes things worse. I am guessing that the shifting of the class instances around in memory for each process is the culprit, because the same solution worked wonders for a simplyfied version of the problem (below)

def task_function(dummy_object):
    dummy_object.sum_ab()
    return dummy_object

class DummyObject:
    def __init__(self, a, b):
       self.a = a
       self.b = b
       self.ab = 0.0

    def sum_ab(self):
       time.sleep(2) #simulates long running task
       self.ab += self.a + self.b

if __name__ == '__main__':
    pool = Pool(processes=NUMBER_OF_WORKERS)
    cells = [DummyObject(i, randint(1,20), randint(1,20)) for i in range(NUMBER_OF_CELLS)]
    for i in range(NUMBER_OF_STEPS):
        pool.map(task_function, cells)

    pool.terminate()

The above simple example speeds things up quite a bit. If sequential execution is implemented (the standard way) the "simulation" takes 400 seconds @ NUMBER_OF_CELLS=200 for one iteration of the for-loop (each cell takes 2 seconds * 200 = 400 s). If I implement the above solution one iteration of the for-loop takes only 8 seconds with NUMBER_OF_CELLS=200 and NUMBER_OF_WORKERS=60. But these DummyObjects are ofcourse very small and simple so the shifting around in memory goes quickly.

Any ideas to implement some version of the above dummy solution would be greatly appreciated.

EDIT 16. 2. 2023 Thanks to Fanchen Bao I have found the remaining bottleneck in my code. It is the coupling function that calculated coupling currents between connected cells.

The coupling function looks like this:

@jit(nopython=True)
def calculate_gj_coupling(cells, cells_neighbours):
    for i, cell in enumerate(cells):
        ca_current = 0.0
        voltage_current = 0.0
        g6p_current = 0.0
        adp_current = 0.0
        for neighbour, weight in cells_neighbours[i]:
            voltage_current += (cell.Cgjv*weight)*(cells[neighbour].V-cell.V)
            ca_current += (cell.Cgjca*weight)*(cells[neighbour].C-cell.C)
            g6p_current += (cell.Cgjg6p*weight)*(0.3*cells[neighbour].G6P-0.3*cell.G6P)
            adp_current += (cell.Cgjadp*weight)*(cells[neighbour].ADPm - cell.ADPm)
        cell.couplingV = voltage_current
        cell.couplingCa = ca_current
        cell.couplingG6P = g6p_current
        cell.couplingADP = adp_current
    return cells

It is basically a nested for-loop because each connection between two cells is weighted (weight parameter).

What would be a more pythonic (and faster) way of writing this up? Keep in mind that this function runs in every simulation step.

EDIT 18. 2 2023 I rewrote the BetaCell class. It now contains all cell parameters (instead of inheriting from the CellParameters class) and all necessary model parameters are provided at initialization (dt, save_step). This allowed me to add the Numba jitclass decorator with corresponding specifications. It threw an error before, because the appears to be a problem with inheritance during compilation, I guess. I also use Numba List() class instead of the Python built-in list.

Original Q&A

There are 1 best solutions below

**Fanchen Bao** · Answer 1 · 2023-02-16T22:29:47.260000

This is not strictly a solution, but a suggestion on how the coupling function could be optimized.

Caveat

Without source data, the pseudocode below is not tested. The implementation may or may not function correctly, but the core idea shall be sound.

Core Idea

Look at this code snippet in particular

for neighbour, weight in cells_neighbours[i]:
    voltage_current += (cell.Cgjv*weight)*(cells[neighbour].V-cell.V)

If we write neighboring cells' weight as w1, w2, ..., wm, neighboring cells' voltage as V1, V2, ..., Vm, then

voltage_current
= cell.Cgjv * w1 * (V1 - cell.V) + cell.Cgjv * w2 * (V2 - cell.V) + ... + cell.Cgjv * wm * (Vm - cell.V)
= cell.Cgjv * (w1 * V1 + w2 * V2 + ... + wm * Vm - cell.V * (w1 + w2 + ... + wm))
= cell.Cgjv * (np.dot(W, V) - cell.V * np.sum(W))

Here W and V refers to vectors [w1, w2, ..., wm] and [V1, V2, ..., Vm] (represented in Python as numpy arrays). We can leverage numpy's vectorization to speed up the inner loop.

The same logic applies to the other three computations, as they follow the same logic. This optimization requires that all V, C, G6P and ADPm be stored as its own numpy array outside the cell object. We are basically striping down OOP to favor speed. It might make the code base a bit harder to maintain, but it surely will get a performance boost.

Optimized Pseudocode

import numpy as np


def calculate_gj_coupling(
    cells: List[Cell],
    cells_neighbours_indices: np.ndarray,
    cells_neighbours_weights: np.ndarray,
    cells_attrb: Dict[str, np.ndarray],
) -> None:
    """Calculate GJ coupling.

    Notice that we don't have to return anything, because the update to cells
    are done in-place

    :param cells: a list of cells.
    :type cells: List[Cell]
    :param cells_neighbours_indices: index of the outer list is the index to
        each cell in cells. Value of the inner list is the index of the
        neighboring cell.
        e.g. given cells_neighbours_indices = [
            [1, 2, 3],
            [0, 3],
            [0, 3],
            [0, 1, 2],
        ]
        We say
        cells[0] is neighbouring with cells[1], cells[2], and cells[3].
        cells[1] is neighbouring cells[0], cells[3]
        cells[2] is neighbouring cells[0], cells[3]
        cells[3] is neighbouring cells[0], cells[1], cells[2]
    :type cells_neighbours_indices: np.ndarray
    :param cells_neighbours_weights: index of the outer list is the index to
        each cell in cells. Value of the inner list is the weight of the
        neighboring cell.
        e.g. given cells_neighbours_weights = [
            [1, 2, 3],
            [4, 5],
            [6, 7],
            [8, 9, 10],
        ]
        We say
        cells[0]'s three neighbors have weights [1, 2, 3]
        cells[1]'s two neighbors have weights [4, 5]
        cells[2]'s three neighbors have weights [6, 7]
        cells[3]'s three neighbors have weights [8, 9, 10]
    :type cells_neighbours_weights: np.ndarray
    :param cells_attrib: a central place to store the attributes of all cells.
        It has the following shape
        {
            'V': np.array([v0, v1, ..., vn]),  # all cells' V
            'C': np.array([c0, c1, ..., cn]),  # all cells' C
            'G6P': np.array([g0, g1, ..., gn]),  # all cells' G6P
            'ADPm': np.array([a0, a1, ..., an]),  # all cells' ADPm
        }
    """
    for i, cell in enumerate(cells):
        idx = cells_neighbours_indices[i]
        ws = cells_neighbours_weights[i]
        sum_w = np.sum(ws)
        cell.couplingV = cell.Cgjv * (np.dot(ws, cells_attrb['V'][idx]) - cell.V * sum_w)
        cell.couplingCa = cell.Cgjca * (np.dot(ws, cells_attrb['C'][idx]) - cell.C * sum_w)
        cell.couplingG6P = cell.Cgjg6p * 0.3 * (np.dot(ws, cells_attrb['G6P'][idx]) - cell.G6P * sum_w)
        cell.couplingADP = cell.Cgjadp * (np.dot(ws, cells_attrb['ADPm'][idx]) - cell.ADPm * sum_w)

Can We Do Even Better?

Notice that the optimized pseudocode above only deals with the inner loop. There is not that much going on on the outer loop as well, so is it possible that we vectorize the outer loop as well? Let's take cell.couplingV as an example. What we want for calculate_gj_coupling to accomplish is the following:

couplingV0 = Cgjv0 * (np.dot(nei_W0, nei_V0) - V0 * np.sum(W0))
couplingV1 = Cgjv1 * (np.dot(nei_W1, nei_V1) - V1 * np.sum(W1))
.
.
.
couplingVn = Cgjvn * (np.dot(nei_Wn, nei_Vn) - Vn * np.sum(Wn))

where couplingV0, couplingV1, ..., couplingVn are the couplingV values of each cell, Cgjv0, Cgjv1, ..., Cgjvn are the Cgjv values of each cell, and V0, V1, ..., Vn are V values of each cell.

nei_W0, nei_W1, ..., nei_Wn are a list of vectors, each vector being a list of weights of a cell's neighboring cells. Similarly, nei_V0, nei_V1, ..., nei_Vn are a list of vectors, each vector being a list of V values of a cell's neighboring cells.

We can rewrite that as

where . is dot product of vectors and * is element-wise multiplication.

This equation tells us that if we use a vector to represent all cell's couplingV, we are able to vectorize the entire process of calculate_gj_coupling.

One more thing that needs addressing is the handling of nei_W and nei_V. We can turn them into matrices W_mat and V_mat. For example, W_mat[i][j] represent the weight of cells[j] when it neighbors cells[i]. If they do not neighbor, set W_mat[i][j] to zero.

Below is another piece of pseudocode to implement the full vectorization idea. Note that we strip down OOP even more. Also, W_mat, V_mat, C_mat, G6P_mat, and ADPm_mat each contains repeated data and is sparse. We are essentially sacrificing space for better time performance.

import numpy as np


def calculate_gj_coupling(
    W_mat: np.ndarray,
    V_mat: np.ndarray,
    C_mat: np.ndarray,
    G6P_mat: np.ndarray,
    ADPm_mat: np.ndarray,
    cells_attrb: Dict[str, np.ndarray],
) -> Dict[str, np.ndarray]:
    """_summary_

    :param W_mat: W_mat is a matrix of ALL weights of cells neighboring all
        other cells. e.g. W_mat[i][j] is the weight of cells[j] when it is
        neighboring cells[i]. If cells[i] and cells[j] do not neighbor,
        set W_mat[i][j] = 0. It has shape (n, n), where n is the total number
        of cells.
    :type W_mat: np.ndarray
    :param V_mat: V_mat is a matrix of ALL Vs of cells neighboring all other
        cells. e.g. V_mat[i][j] is the V of cells[j] when it is neighboring
        cells[i]. If cells[i] and cells[j] do not neighbor, set V_mat[i][j] = 0
        It has shape (n, n), where n is the total number of cells.
    :type V_mat: np.ndarray
    :param C_mat: C_mat is a matrix of ALL Cs of cells neighboring all other
        cells. e.g. C_mat[i][j] is the C of cells[j] when it is neighboring
        cells[i]. If cells[i] and cells[j] do not neighbor, set C_mat[i][j] = 0
        It has shape (n, n), where n is the total number of cells.
    :type C_mat: np.ndarray
    :param G6P_mat: G6P_mat is a matrix of ALL G6Ps of cells neighboring all
        other cells. e.g. G6P_mat[i][j] is the G6P of cells[j] when it is
        neighboring cells[i]. If cells[i] and cells[j] do not neighbor, set
        G6P_mat[i][j] = 0. It has shape (n, n), where n is the total number of
        cells.
    :type G6P_mat: np.ndarray
    :param ADPm_mat: ADPm_mat is a matrix of ALL ADPms of cells neighboring all
        other cells. e.g. ADPm_mat[i][j] is the ADPm of cells[j] when it is
        neighboring cells[i]. If cells[i] and cells[j] do not neighbor, set
        ADPm_mat[i][j] = 0. It has shape (n, n), where n is the total number of
        cells.
    :type ADPm_mat: np.ndarray
    :param cells_attrib: a central place to store the attributes of all cells.
        It has the following shape
        {
            'V': np.array([v0, v1, ..., vn]),  # all cells' V
            'C': np.array([c0, c1, ..., cn]),  # all cells' C
            'G6P': np.array([g0, g1, ..., gn]),  # all cells' G6P
            'ADPm': np.array([a0, a1, ..., an]),  # all cells' ADPm
            'Cgjv': np.array([cv0, cv1, ..., cvn]),  # all cells' Cgjv
            'Cgjca': np.array([cca0, cca1, ..., ccan]),  # all cells' Cgjca
            'Cgjg6p': np.array([cg6p0, cg6p1, ..., cg6pn]),  # all cells' Cgjg6p
            'Cgjadp': np.array([cadp0, cadp1, ..., cadpn]),  # all cells' Cgjadp
        }
    :return: a dictionary of the following shape
        {
            'couplingV': np.ndarray,
            'couplingCa': np.ndarray,
            'couplingG6P': np.ndarray,
            'couplingADP': np.ndarray,
        }
        Each array has length n, recording the coupling values of each cell.
    :rtype: Dict[str, np.ndarray]
    """
    sum_w = np.sum(W_mat, axis=1)  # a vector of weight sums for each cell
    # `*` is element-wise multiplication, `@` is matrix multiplication
    return {
        'couplingV': cells_attrb['Cgjv'] * (np.diag(W_mat @ V_mat) - cells_attrb['V'] * sum_w),
        'couplingCa': cells_attrb['Cgjvca'] * (np.diag(W_mat @ C_mat) - cells_attrb['C'] * sum_w),
        'couplingG6P': cells_attrb['Cgjg6p'] * 0.3 * (np.diag(W_mat @ G6P_mat) - cells_attrb['G6P'] * sum_w),
        'couplingADP': cells_attrb['Cgjadp'] * (np.diag(W_mat @ ADPm_mat) - cells_attrb['ADPm'] * sum_w),
    }

Multiprocessing in python for numerical simulation with large objects

There are 1 best solutions below

Caveat

Core Idea

Optimized Pseudocode

Can We Do Even Better?

Related Questions in PYTHON-3.X

Related Questions in PERFORMANCE

Related Questions in SIMULATION

Related Questions in PYTHON-MULTIPROCESSING

Related Questions in NUMERICAL-COMPUTING

Trending Questions

Popular # Hahtags

Popular Questions