How would you vectorize a fraction of sums of matrices (Expectation Maximization) in numpy?

Question

How would you vectorize a fraction of sums of matrices (Expectation Maximization) in numpy?

78 Views Asked by Branden Keck At 21 March 2024 at 16:53

I am trying to vectorize the following Expectation-Maximization / clustering equation for a 2-dimensional Gaussian distribution using numpy. I have a naive approach that I will include at the end of my question:

For context, the variables and dimensions are defined as follows:

$n$ = data point index (i.e. 1-1000)
$k$ = cluster index (i.e. 1-3)
$z$ = a conditional probability that datapoint $n$ is in cluster $k$ (in [0,1])
$y$ = value of datapoint $n$ (shape (2,))
$mu$ = current estimated multi-variate mean of cluster $k$ (shape (2,))

The end product is a numerator that is a sum of (2, 2) shape matrices and the denominator is a scalar. The final value is a (2, 2) covariate matrix estimate. This must also be done for each value of "k" (1, 2, 3).

I've achieved a vectorized approach for other values by defining the following numpy arrays:

$Z$ = est. probability values for each datapoint, cluster
$X$ = multivariate data matrix
$MU$ = est. cluster means

My naive code is as follows:

for kk in range(k):
    numsum = 0
    for ii in range(X.shape[0]):
        diff = (X[ii, :]-mu[kk, :]).reshape(-1, 1)
        numsum = numsum + Z[ii, kk]*np.matmul(diff, diff.T)
    sigma[kk] = numsum / np.sum(Z[:, kk])

Long story long - is there any better way to do this?

Original Q&A

There are 2 best solutions below

simon On 21 March 2024 at 17:23

The following should work:

diff = X[np.newaxis, :, :] - mu[:, np.newaxis, :]  # kxnx2
numsum = np.matmul(Z.T[:, np.newaxis, :] * diff.transpose(0, 2, 1), diff)  # kx2x2
sigma_proposed = numsum / Z.sum(axis=0)[:, np.newaxis, np.newaxis]  # kx2x2

Altogether, I checked it with the following code:

import numpy as np

n, k = 1000, 3

# Create some data
rand = np.random.default_rng(seed=0xC0FFEE)  # For reproducibility
Z = rand.uniform(size=(n, k))
X = rand.normal(size=(n, 2))
mu = rand.normal(size=(k, 2))
sigma = np.zeros((k, 2, 2))

# Code from question
for kk in range(k):
    numsum = 0
    for ii in range(X.shape[0]):
        diff = (X[ii, :]-mu[kk, :]).reshape(-1, 1)
        numsum = numsum + Z[ii, kk]*np.matmul(diff, diff.T)
    sigma[kk] = numsum / np.sum(Z[:, kk])
    
# Proposed
diff = X[np.newaxis, :, :] - mu[:, np.newaxis, :]  # kxnx2
numsum = np.matmul(Z.T[:, np.newaxis, :] * diff.transpose(0, 2, 1), diff)  # kx2x2
sigma_proposed = numsum / Z.sum(axis=0)[:, np.newaxis, np.newaxis]  # kx2x2

assert np.allclose(sigma, sigma_proposed)

**Onyambu** · Accepted Answer · 2024-03-21T18:08:31.170000

Onyambu On 21 March 2024 at 18:08 BEST ANSWER

You can use np.einsum:

d = X - mu[:,None]
np.einsum('ijk,ijm,ji->imk', d, d, Z/Z.sum(0, keepdims=True))

How would you vectorize a fraction of sums of matrices (Expectation Maximization) in numpy?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in VECTORIZATION

Trending Questions

Popular # Hahtags

Popular Questions