Plot unimodal distributions determined from a multimodal distribution

540 Views Asked by At

I've used GaussianMixture to analyze a multimodal distribution. From the GaussianMixture class I can access the means and covariances using the attributes means_ and covariances_. How can I use them to now plot the two underlying unimodal distributions?

I thought of using scipy.stats.norm but I don't know what to select as parameters for loc and scale. The desired output would be analogously as shown in the attached figure.

The example code of this question was modified from the answer here.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
from scipy.stats import norm

ls = np.linspace(0, 60, 1000)
multimodal_norm = norm.pdf(ls, 0, 5) + norm.pdf(ls, 20, 10)
plt.plot(ls, multimodal_norm)

# concatenate ls and multimodal to form an array of samples
# the shape is [n_samples, n_features]
# we reshape them to create an additional axis and concatenate along it
samples = np.concatenate([ls.reshape((-1, 1)), multimodal_norm.reshape((-1,1))], axis=-1)
print(samples.shape)

gmix = mixture.GaussianMixture(n_components = 2, covariance_type = "full")
fitted = gmix.fit(samples)

print(fitted.means_)
print(fitted.covariances_)

# The idea is something like the following (not working):
new_norm1 = norm.pdf(ls, fitted.means_, fitted.covariances_)
new_norm2 = norm.pdf(ls, fitted.means_, fitted.covariances_)
plt.plot(ls, new_norm1, label='Norm 1')
plt.plot(ls, new_norm2, label='Norm 2')
1

There are 1 best solutions below

1
On BEST ANSWER

It is not entirely clear what you are trying to accomplish. You are fitting a GaussianMixture model to the concatenation of the sum of the values of pdfs of two gaussians sampled on a uniform grid, and the unifrom grid itself. This is not how a Gaussian Mixture model is meant to be fitted. Typically one fits a model to random observations drawn from some distribution (typically unknown but could be a simulated one).

Let me assume that you want to fit the GaussianMixture model to a sample drawn from a Gaussian Mixture distribution. Presumably to test how well the fit works given you know what the expected outcome is. Here is the code for doing this, both to simulate the right distribution and to fit the model. It prints the parameters that the fit recovered from the sample -- we observe that they are indeed close to the ones we used to simulate the sample. Plot of the density of the GaussianMixture distribution that fits to the data is generated at the end

import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
from scipy.stats import norm

# set simulation parameters
mean1, std1, w1 = 0,5,0.5
mean2, std2, w2 = 20,10,1-w1

# simulate constituents
n_samples = 100000
np.random.seed(2021)
gauss_sample_1 = np.random.normal(loc = mean1,scale = std1,size = n_samples)
gauss_sample_2 = np.random.normal(loc = mean2,scale = std2,size = n_samples)
binomial = np.random.binomial(n=1, p=w1, size = n_samples)

# simulate gaussian mixture
mutlimodal_samples = (gauss_sample_1 * binomial + gauss_sample_2 * (1-binomial)).reshape(-1,1)

# define and fit the mixture model
gmix = mixture.GaussianMixture(n_components = 2, covariance_type = "full")
fitted = gmix.fit(mutlimodal_samples)

print('fitted means:',fitted.means_[0][0],fitted.means_[1][0])
print('fitted stdevs:',np.sqrt(fitted.covariances_[0][0][0]),np.sqrt(fitted.covariances_[1][0][0]))
print('fitted weights:',fitted.weights_)

# Plot component pdfs and a joint pdf
ls = np.linspace(-50, 50, 1000)
new_norm1 = norm.pdf(ls, fitted.means_[0][0], np.sqrt(fitted.covariances_[0][0][0]))
new_norm2 = norm.pdf(ls, fitted.means_[1][0], np.sqrt(fitted.covariances_[1][0][0]))
multi_pdf = w1*new_norm1 + (1-w1)*new_norm2
plt.plot(ls, new_norm1, label='Norm pdf 1')
plt.plot(ls, new_norm2, label='Norm pdf 2')
plt.plot(ls, multi_pdf, label='multi-norm pdf')
plt.legend(loc = 'best')
plt.show()

The results are

fitted means: 22.358448018824642 0.8607494960575028
fitted stdevs: 8.770962351118127 5.58538485134623
fitted weights: [0.42517515 0.57482485]

as we see they are close (up to their ordering, which of course the model cannot recover but it is irrelevant anyway) to what went into the simulation:

mean1, std1, w1 = 0,5,0.5
mean2, std2, w2 = 20,10,1-w1

And the plot of the density and its parts. Recall that the pdf of the GaussianMixture is not the sum of the pdfs but a weighted average with weights w1, 1-w1:

mixture-pdf