i'm trying to follow this tutorial for GMM with Python SciKit. THe problem is that the original code does not work out of the box. It says there are problems with the shape of the input arrays and that GMM is now depreacated. I've tried to rewrite it as:
np.random.seed(2)
x = np.concatenate([np.random.normal(0, 2, 200),
np.random.normal(5, 5, 200),
np.random.normal(3, 0.5, 600)])
x = np.reshape(x, (-1, 1))
plt.hist(x, 80, normed=True)
plt.xlim(-10, 20)
clf = GaussianMixture(4, max_iter=500, random_state=3).fit(x)
xpdf = np.linspace(-10, 20, 1000)
xpdf = np.reshape(xpdf, (-1, 1))
density = np.exp(clf.score(xpdf))
plt.hist(x, 80, normed=True, alpha=0.5)
plt.plot(xpdf, density, '-r')
plt.xlim(-10, 20)
But still i get a ValueError: x and y must have same first dimension
. As far as i can understand now the problem has been moved from the shape of arrays to the shape of density
variable. But i'm not sure what is actually going on. Could anyone please shed some light on this? Thanks.
If you check the shape of
density
the problem will be much clearer:The
score
method returns the log-likelihood of the entire dataset that it is passed, which is just a single scalar value. You wantscore_samples
, which will provide the log likelihood of each individual point.It's possible the API may have changed here since the tutorial was written -- I'm not sure.